Should we use Google's crawl delay setting?
-
We’ve been noticing a huge uptick in Google’s spidering lately, and along with it a notable worsening of render times.
Yesterday, for example, Google spidered our site at a rate of 30:1 (google spider vs. organic traffic.) So in other words, for every organic page request, Google hits the site 30 times.
Our render times have lengthened to an avg. of 2 seconds (and up to 2.5 seconds). Before this renewed interest Google has taken in us we were seeing closer to one second average render times, and often half of that.
A year ago, the ratio of Spider to Organic was between 6:1 and 10:1.
Is requesting a crawl-delay from Googlebot a viable option?
Our goal would be only to reduce Googlebot traffic, and hopefully improve render times and organic traffic.
Thanks,
Trisha
-
Unfortunately you can't change crawl settings for Google in a robots.txt file, they just ignore it. The best way to rate limit them is using custom Crawl settings in Google Webmaster Tools. (look under Site configuration > Settings)
You also might want to consider using your loadbalancer to direct Google (and other search engines) to a "condomised" group of servers (app, db, cache, search) thereby ensuring your users arent inadvertantly hit by perfomance issues caused by over zealous bot crawling.
-
We're a publisher, which means that as an industry our normal render times are always at the top of the chart. Ads are notoriously slow to load, and that's how we earn our keep. These results are bad, though, even for publishing.
We're serving millions of uniques a month, on a bank of dedicated servers hosted off site, load balanced, etc.
-
more info on that here: http://www.robotstxt.org/
-
Wow! those are really high render times. Have you considered perhaps moving to another webserver? NginX is pretty damm fast, and could probably get those render times down. Also, are you on a shared host? or is this a dedicated server?
What you're looking for is the robots.txt file though, and you want to add some lines like this:
User-agent: * Disallow: Crawl-Delay: 10 User-agent: ia_archiver Disallow: / User-agent: Ask Jeeves Crawl-Delay: 120 User-agent: Teoma Disallow: /html/ Crawl-Delay: 120
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New theme adds ?v=1d20b5ff1ee9 to all URL's as part of cache. How does this affect SEO
New theme I am working in ads ?v=1d20b5ff1ee9 to every URL. Theme developer says its a server setting issue. GoDaddy support says its part of cache an becoming prevalent in new themes. How does this impact SEO?
Technical SEO | | DML-Tampa0 -
Google crawling but not indexing for no apparent reason
Client's site went secure about two months ago and chose root domain as rel canonical (so site redirects to https://rootdomain.com (no "www"). Client is seeing the site recognized and indexed by Google about every 3-5 days and then not indexed until they request a "Fetch". They've been going through this annoying process for about 3 weeks now. Not sure if it's a server issue or a domain issue. They've done work to enhance .htaccess (i.e., the redirects) and robots.txt. If you've encountered this issue and have a recommendation or have a tech site or person resource to recommend, please let me know. Google search engine results are respectable. One option would be to do nothing but then would SERPs start to fall without requesting a new Fetch? Thanks in advance, Alan
Technical SEO | | alankoen1230 -
Why are only PDFs on my client's site being indexed, and not actual pages?
My client has recently built a new site (we did not build this), which is a subdomain of their main site. The new site is: https://addstore.itelligencegroup.com/uk/en/. (Their main domain is: http://itelligencegroup.com/uk/) This new Addstore site has recently gone live (in the past week or so) and so far, Google appears to have indexed 56 pdf files that are on the site, but it hasn't indexed any of the actual web pages yet. I can't figure out why though. I've checked the robots.txt file for the site which appears to be fine: https://addstore.itelligencegroup.com/robots.txt. Does anyone have any ideas about this?
Technical SEO | | mfrgolfgti0 -
Blocked URL's by robots.txt
In Google Webmaster Tools shows me 10,936 Blocked URL's by robots.txt and it is very strange when you go to the "Index Status" section where shows that since April 2012 robots.txt blocked many URL's. You can see more precise on the image attached (chart WMT) I can not explain why I have blocked URL's ? because I have nothing in robots.txt.
Technical SEO | | meralucian37
My robots.txt is like this: User-agent: * I thought I was penalized by Penguin in April 2012 because constantly i'am losing visitors now reaching over 40%. It may be a different penalty? Any help is welcome because i'm already so saturated. Mera robotstxt.jpg0 -
Redirect old URL's from referring sites?
Hi I have just came across some URL's from the previous web designer and the site structure has now changed. There are some links on the web however that are still pointing at the old deep weblinks. Without having to contact each site it there a way to automatically sort the links from the old structure www.mydomain.com/show/english/index.aspx to just www.mydomain.com Many Thanks
Technical SEO | | ocelot0 -
How do crawl errors from SEOmoz tool set effect rankings?
Hello - The other day I presented the crawl diagnostic report to a client. We identified duplicate page title errors, missing meta description errors, and duplicate content errors. After reviewing the report we presented it to the clients web company who operates a closed source CMS. Their response was that these errors are not worthy of fixing and in fact they are not hurting the site. We are having issues getting the errors fixed and I would like your opinion on this matter. My question is, how bad are these errors? Should we not fix them? Should they be fixed? Will fixing the errors have an impact on our site's rankings? Personally, I think the question is silly. I mean, the errors were found using the SEOmoz tool kit, these errors have to be effecting SEO.....right? The attached image is the result of the Crawl Diagnostics that crawled 1,400 pages. NOTE: Most of the errors are coming from Pages like blog/archive/2011-07/page-2 /blog/category/xxxxx-xxxxxx-xxxxxxx/page-2 testimonials/147/xxxxx--xxxxx (xxxx represents information unique to the client) Thanks for your insight! c9Q33.png
Technical SEO | | Gabe0 -
About Bot's IP
Hi, one of my customers had probably block the IP of SEOMOZ's bot. Could you give me : IP User-agent's name thks for helping me 😉
Technical SEO | | dawa1 -
Are URL's with trailing slash seen as two different URLs
Hello, http://www.example.com and http://ww.example.com/ Are these seen as two different URL's ? Just as with www or non www ? Or it doesn't make any difference ?
Technical SEO | | seoug_20050