Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Is there any value in having a blank robots.txt file?
-
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file?
What is the minimum you would include in a basic robots.txt file?
-
I know this is four years old, but there's value in having a blank robots.txt as some tools (including the latest version of the Moz crawler) will baulk at sites without a robots.txt file.
-
Thanks for both of your replies. As per my question it was around whether there is any value having a blank robots.txt file. Philipp's answer was right on the money.
-
i mentioned same only, The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site."
n has added - More and more people use robots,txt to disallow access to some administration or private folders of the site
-
No use in having a blank robots.txt. Minimum requirement if you want to have your site crawled is this:
User-agent: * Allow: /
Note that Gagans example above will block the entire site.
-
Hi, This is what i got
" Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called_The Robots Exclusion Protocol_. It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: * Disallow: /
The "<tt>User-agent: *</tt>" means this section applies to all robots. The "<tt>Disallow: /</tt>" tells the robot that it should not visit any pages on the site."
More and more people use robots,txt to disallow access to some administration or private folders of the site . If you dont want to hide anything then may be you can leave it blank
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Multiple robots.txt files on server
Hi! I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step. One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names: robots.txt (original dupplicate)
Technical SEO | | mjukhud
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate) Would really appreciate help and expertise suggestions. Thanks!0 -
Do web design footer links of websites you build have value?
Hi everyone. I am trying to build up DA for my site and create linking opportunities with my clients sites but I am not seeing any link value. I just did a redesign with another firm and we built out www.denbow.com . We have links to our sites in the footer but for some reason it's not being indexed. Can someone help me understand if it is good to put built by a href link in the footer? I've built almost 12 sites in my first 1.5 years of being in business for myself and I thought the links would pass some sort of value. Thanks in advance for the help and education. Regards, Noob Gary
Technical SEO | | gdavey0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Remove html file extension and 301 redirects
Hi Recently I ask for some work done on my website from a company, but I am not sure what they've done is right.
Technical SEO | | ulefos
What I wanted was html file extensions to be removed like
/ash-logs.html to /ash-logs
also the index.html to www.timports.co.uk
I have done a crawl diagnostics and have duplicate page content and 32 page title duplicates. This is so doing my head in please help This is what is in the .htaccess file <ifmodule pagespeed_module="">ModPagespeed on
ModPagespeedEnableFilters extend_cache,combine_css, collapse_whitespace,move_css_to_head, remove_comments</ifmodule> <ifmodule mod_headers.c="">Header set Connection keep-alive</ifmodule> <ifmodule mod_rewrite.c="">Options +FollowSymLinks -MultiViews</ifmodule> DirectoryIndex index.html RewriteEngine On
# Rewrite valid requests on .html files RewriteCond %{REQUEST_FILENAME}.html -f RewriteRule ^ %{REQUEST_URI}.html?rw=1 [L,QSA]
# Return 404 on direct requests against .html files RewriteCond %{REQUEST_URI} .html$
RewriteCond %{QUERY_STRING} !rw=1 [NC]
RewriteRule ^ - [R=404] AddCharset UTF-8 .html # <filesmatch “.(js|css|html|htm|php|xml|swf|flv|ashx)$”="">#SetOutputFilter DEFLATE #</filesmatch> <ifmodule mod_expires.c="">ExpiresActive On
ExpiresByType image/gif "access plus 1 years"
ExpiresByType image/jpeg "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType image/x-icon "access plus 1 years"
ExpiresByType image/jpg "access plus 1 years"
ExpiresByType text/css "access 1 years"
ExpiresByType text/x-javascript "access 1 years"
ExpiresByType application/javascript "access 1 years"
ExpiresByType image/x-icon "access 1 years"</ifmodule> <files 403.shtml="">order allow,deny allow from all</files> redirect 301 /PRODUCTS http://www.timports.co.uk/kiln-dried-logs
redirect 301 /kindling_firewood.html http://www.timports.co.uk/kindling-firewood.html
redirect 301 /about_us.html http://www.timports.co.uk/about-us.html
redirect 301 /log_delivery.html http://www.timports.co.uk/log-delivery.html redirect 301 /oak_boards_delivery.html http://www.timports.co.uk/oak-boards-delivery.html
redirect 301 /un_edged_oak_boards.html http://www.timports.co.uk/un-edged-oak-boards.html
redirect 301 /wholesale_logs.html http://www.timports.co.uk/wholesale-logs.html redirect 301 /privacy_policy.html http://www.timports.co.uk/privacy-policy.html redirect 301 /payment_failed.html http://www.timports.co.uk/payment-failed.html redirect 301 /payment_info.html http://www.timports.co.uk/payment-info.html1 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0