Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Googlebot Crawl Rate causing site slowdown
-
I am hearing from my IT department that Googlebot is causing as massive slowdown/crash our site. We get 3.5 to 4 million pageviews a month and add 70-100 new articles on the website each day. We provide daily stock research and marke analysis, so its all high quality relevant content. Here are the crawl stats from WMT:
I have not worked with a lot of high volume high traffic sites before, but these crawl stats do not seem to be out of line. My team is getting pressure from the sysadmins to slow down the crawl rate, or block some or all of the site from GoogleBot.
Do these crawl stats seem in line with sites? Would slowing down crawl rates have a big effect on rankings?
Thanks
-
Similar to Michael, my IT team is saying Googlebot is causing performance issues - specifically during peak hours.
It was suggested that we consider using apache re-write rules to serve Googlebot a 503 during our peak hours to limit the impact. I found the stackoverflow thread (link below) in which John Muller seems to suggest this approach, but has anyone tried this?
-
Blocking googlebot is a quick and easy way to disappear from the Index. Not an option if you want Google to rank your site.
For smaller sites or ones with limited technologies, I sometimes recommend using a crawl-delay directive in robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=48620
But I agree with both Shane and Zachary, this doesn't seem like the long term answer to your problems. Your crawl stats don't seem out of line for a site of your size, and perhaps a better hardware configuration could help things out.
With 70 new articles each day, I'd want Google crawling my site as much as they pleased.
-
whatever Google's default is in GWT - It sets it for you.
You can change it, but it is not reccomended unless for a specific reason (such as Michael Lewis's specific scenario) even though, I am not completely sold that Gbot is what is causing the "dealbreaking" overhead.
-
what is the ideal setting on the crawler. i have been wondering about this for some time.
-
Hi,
Your admins saying that, is like someone saying "we need to shut the site down, we are getting to much traffic!" Common sys-admin response (fix it somewhere else)
4GB a day downloaded, is alot of Bot traffic, but it appears you are a "real time" site, that is probably actually helped and maybe even reliant on your high crawl rate....
I would upgrade hardware - or even look into some kind of off site cloud redundancy for failover (Hybrid)
I highly doubt that 4GB a day, is a "dealbreaker",but of course that is just based off the one image, and your admins probably have resource monitors - Maybe Varnish is an answer for static content to help lighten load???? Or CDN for file hosting to lighten bandwidth load?
Shane
-
We are hosting the site on our own hardware at a big colo. I know that we are upgrading servers but they will not be online until the end of July.
Thanks!
-
I wouldn't slow the crawl rate. A high crawl rate is good so that Google can keep their index of your website current.
The better solution is to reconsider your hardware and networking setup. Do you know how you are being hosted? From my own experience with a website of that size, a load balancer on two decent dedicated servers should handle the load without problems. Google crawling your pages shouldn't create noticeable overhead on the right setup.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Intermittent 404 - What causes them and how to fix?
Hi! I'm working on a client site at the moment and I've discovered a couple of pages that are 404ing but producing a 200 OK response. However, I have checked these URLs again and some are now producing a 404 Error response. No changes have been made (that I'm aware of) so it appears that the URLs are returning both 200 OK and 404 Error responses intermittently. Any ideas what could cause this and the best solution? Thanks!
Technical SEO | | daniel-brooks0 -
Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.
I have created a new website and need to redirect all of the previous pages to the new one. The old website was built in coldfusion and the new site is built in wordpress. One of the pages I'm trying to redirect is www.norriseal.com/products.cfm to http://norrisealwellmark.com/products/. This is what I have in my .htaccess file <ifmodule mod_rewrite.c="">Options +FollowSymlinks
Technical SEO | | MarketHubb
RewriteEngine On
RewriteBase /
Redirect 301 /products.cfm http://norrisealwellmark.com/products/</ifmodule> The result of this redirect is http://norrisealwellmark.com/products.cfm How do I prevent the .cfm from appending to the destination URL?1 -
Mobile site ranking instead of/as well as desktop site in desktop SERPS
I have just noticed that the mobile version of my site is sometimes ranking in the desktop serps either instead of as well as the desktop site. It is not something that I have noticed in the past as it doesn't happen with the keywords that I track, which are highly competitive. It is happening for results that include our brand name, e.g '[brand name][search term]'. The mobile site is served with mobile optimised content from another URL. e.g wwww.domain.com/productpage redirects to m.domain.com/productpage for mobile. Sometimes I am only seen the mobile URL in the desktop SERPS, other times I am seeing both the desktop and mobile URL for the same product. My understanding is that the mobile URL should not be ranking at all in desktop SERPS, could we be being penalised for either bad redirects or duplicate content? Any ideas as to how I could further diagnose and solve the problem if you do believe that it could be harming rankings?
Technical SEO | | pugh0 -
Way to spider Wordpress site
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages. I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects. I tried using these spidering programs: WinHTTack Website Copier and PageNest Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
Technical SEO | | DanCrean0 -
How does Google Crawl Multi-Regional Sites?
I've been reading up on this on Webmaster Tools but just wanted to see if anyone could explain it a bit better. I have a website which is going live soon which is going to be set up to redirect to a localised URL based on the IP address i.e. NZ IP ranges will go to .co.nz, Aus IP addresses would go to .com.au and then USA or other non-specified IP addresses will go to the .com address. There is a single CMS installation for the website. Does this impact the way in which Google is able to search the site? Will all domains be crawled or just one? Any help would be great - thanks!
Technical SEO | | lemonz0 -
Javascript to manipulate Google's bounce rate and time on site?
I was referred to this "awesome" solution to high bounce rates. It is suppose to "fix" bounce rates and lower them through this simple script. When the bounce rate goes way down then rankings dramatically increase (interesting study but not my question). I don't know javascript but simply adding a script to the footer and watch everything fall into place seems a bit iffy to me. Can someone with experience in JS help me by explaining what this script does? I think it manipulates the reporting it does to GA but I'm not sure. It was supposed to be placed in the footer of the page and then sit back and watch the dollars fly in. 🙂
Technical SEO | | BenRWoodard1 -
When is the last time Google crawled my site
How do I tell the last time Google crawled my site. I found out it is not the "Cache" which I had thought it was.
Technical SEO | | digitalops0 -
Crawling image folders / crawl allowance
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping. My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files. Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled? Thanks all!
Technical SEO | | evoNick0