Is there a reason to set a crawl-delay in the robots.txt?
-
I've recently encountered a site that has set a crawl-delay command set in their robots.txt file. I've never seen a need for this to be set since you can set that in Google Webmaster Tools for Googlebot. They have this command set for all crawlers, which seems odd to me. What are some reasons that someone would want to set it like that? I can't find any good information on it when researching.
-
Google does not support the crawl delay command directly, but you can lower your crawl priority inside Google Webmaster Central.
So you are right using it the way you are. If you have it in the robots.txt, it does not really do anything and it will show in the webmaster console as well that GWT does not support it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I disallow crawl of my Job board?
MOZ crawler is telling me we have loads of duplicate content issues. We use a Job Board plugin on our Wordpress site and we have allot of duplicate or very similar jobs (usually just a different location), but the plugin doesn't allow us to add any rel canonical tags to the individual jobs. Should I disallow the /jobs/ url in the robots.txt file? This will solve the duplicate content issue but then Google wont be able to crawl any of the individual job listings Has anyone had any experience working with a job board plugin on Wordpress and had a similar issue, or can advise on how best to solve our duplicate content?? Thanks 🙂
Technical SEO | | O2C0 -
What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file. Other than perhaps using up crawl budget, are there any other negative implications?
Technical SEO | | richdan0 -
Have I constructed my robots.txt file correctly for sitemap autodiscovery?
Hi, Here is my sitemap: User-agent: * Sitemap: http://www.bedsite.co.uk/sitemaps/sitemap.xml Directories Disallow: /sendfriend/
Technical SEO | | Bedsite
Disallow: /catalog/product_compare/
Disallow: /media/catalog/product/cache/
Disallow: /checkout/
Disallow: /categories/
Disallow: /blog/index.php/
Disallow: /catalogsearch/result/index/
Disallow: /links.html I'm using Magento and want to make sure I have constructed my robots.txt file correctly with the sitemap autodiscovery? thanks,0 -
"Extremely high number of URLs" warning for robots.txt blocked pages
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system: http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search. User-agent: * Disallow: /trackingredirect However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed. If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
Technical SEO | | EhrenReilly0 -
Crawl Diagnostics Report 500 erorr
How can I know what is causing my website to have 500 errors and how I locate it and fix it?
Technical SEO | | Joseph-Green-SEO0 -
Setting preferred domain as www or none www
Way back before panda I used to rank for certain keywords pretty well. Of course like many others after panda I lost some of those rankings. I have been getting better since then so its not that bad. I was poking around in Google Webmaster Tools and I noticed something which I need some clarification in. History my site freescrabbledictionary.com used to be indexed as a none www. Then some time ago I can't remember when I set it to www. Tonight I was looking through my webmaster tools and I noticed something that did not make sense to me. In my content keywords section for the none www my list is as follows Content Keywords <form action="https://www.google.com/webmasters/tools/keywords-list?hl=en&siteUrl=http://freescrabbledictionary.com/" method="GET"> Keyword Significance 1. scrabble 2. words (2 variants) 3. dictionary 4. cheat 5. finder 6. friends 7. maker (2 variants) 8. noun 9. letter (2 variants) 10. hasbro 11. mattel 12. spear 13. found (2 variants) 14. sowpods 15. freescrabbledictionary 16. builder 17. affiliated 18. search 19. solver 20. lists </form> Then I looked at my www lists and its Content Keywords <form action="https://www.google.com/webmasters/tools/keywords-list?hl=en&siteUrl=http://www.freescrabbledictionary.com/" method="GET"> Keyword Significance 1. words (3 variants) 2. scrabble (2 variants) 3. letter (4 variants) 4. points 5. cheat (3 variants) 6. friends (2 variants) 7. finder (2 variants) 8. anagram (2 variants) 9. dictionary 10. tool (2 variants) 11. hasbro 12. mattel 13. spear 14. game (4 variants) 15. mobile 16. affiliated (3 variants) 17. berkshire 18. canada 19. calculations (5 variants) 20. coming (4 variants) </form> My none www version has the order (especially the first 5 keywords) that I want, my www version is no were near it. If I change back to the none www version could I possible see an change in rank? or can it effect it if I change it? I am starting to think I shot myself in the foot when I switched...
Technical SEO | | cbielich0 -
Client accidently blocked entire site with robots.txt for a week
Our client was having a design firm do some website development work for them. The work was done on a staging server that was blocked with a robots.txt to prevent duplicate content issues. Unfortunately, when the design firm made the changes live, they also moved over the robots.txt file, which blocked the good, live site from search for a full week. We saw the error (!) as soon as the latest crawl report came in. The error has been corrected, but... Does anyone have any experience with a snafu like this? Any idea how long it will take for the damage to be reversed and the site to get back in the good graces of the search engines? Are there any steps we should take in the meantime that would help to rectify the situation more quickly? Thanks for all of your help.
Technical SEO | | pixelpointpress0