Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Googlebot Crawl Rate causing site slowdown
-
I am hearing from my IT department that Googlebot is causing as massive slowdown/crash our site. We get 3.5 to 4 million pageviews a month and add 70-100 new articles on the website each day. We provide daily stock research and marke analysis, so its all high quality relevant content. Here are the crawl stats from WMT:
I have not worked with a lot of high volume high traffic sites before, but these crawl stats do not seem to be out of line. My team is getting pressure from the sysadmins to slow down the crawl rate, or block some or all of the site from GoogleBot.
Do these crawl stats seem in line with sites? Would slowing down crawl rates have a big effect on rankings?
Thanks
-
Similar to Michael, my IT team is saying Googlebot is causing performance issues - specifically during peak hours.
It was suggested that we consider using apache re-write rules to serve Googlebot a 503 during our peak hours to limit the impact. I found the stackoverflow thread (link below) in which John Muller seems to suggest this approach, but has anyone tried this?
-
Blocking googlebot is a quick and easy way to disappear from the Index. Not an option if you want Google to rank your site.
For smaller sites or ones with limited technologies, I sometimes recommend using a crawl-delay directive in robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=48620
But I agree with both Shane and Zachary, this doesn't seem like the long term answer to your problems. Your crawl stats don't seem out of line for a site of your size, and perhaps a better hardware configuration could help things out.
With 70 new articles each day, I'd want Google crawling my site as much as they pleased.
-
whatever Google's default is in GWT - It sets it for you.
You can change it, but it is not reccomended unless for a specific reason (such as Michael Lewis's specific scenario) even though, I am not completely sold that Gbot is what is causing the "dealbreaking" overhead.
-
what is the ideal setting on the crawler. i have been wondering about this for some time.
-
Hi,
Your admins saying that, is like someone saying "we need to shut the site down, we are getting to much traffic!" Common sys-admin response (fix it somewhere else)
4GB a day downloaded, is alot of Bot traffic, but it appears you are a "real time" site, that is probably actually helped and maybe even reliant on your high crawl rate....
I would upgrade hardware - or even look into some kind of off site cloud redundancy for failover (Hybrid)
I highly doubt that 4GB a day, is a "dealbreaker",but of course that is just based off the one image, and your admins probably have resource monitors - Maybe Varnish is an answer for static content to help lighten load???? Or CDN for file hosting to lighten bandwidth load?
Shane
-
We are hosting the site on our own hardware at a big colo. I know that we are upgrading servers but they will not be online until the end of July.
Thanks!
-
I wouldn't slow the crawl rate. A high crawl rate is good so that Google can keep their index of your website current.
The better solution is to reconsider your hardware and networking setup. Do you know how you are being hosted? From my own experience with a website of that size, a load balancer on two decent dedicated servers should handle the load without problems. Google crawling your pages shouldn't create noticeable overhead on the right setup.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?
Does anyone know the linking of hashtags on Wix sites does it negatively or positively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please? For example at the bottom of this blog post https://www.poppyandperle.com/post/face-painting-a-global-language the hashtags are linked, but they don't go to a page, they go to search results of all other blogs using that hashtag. Seems a bit of a strange approach to me.
Technical SEO | | Mediaholix0 -
Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.
I have created a new website and need to redirect all of the previous pages to the new one. The old website was built in coldfusion and the new site is built in wordpress. One of the pages I'm trying to redirect is www.norriseal.com/products.cfm to http://norrisealwellmark.com/products/. This is what I have in my .htaccess file <ifmodule mod_rewrite.c="">Options +FollowSymlinks
Technical SEO | | MarketHubb
RewriteEngine On
RewriteBase /
Redirect 301 /products.cfm http://norrisealwellmark.com/products/</ifmodule> The result of this redirect is http://norrisealwellmark.com/products.cfm How do I prevent the .cfm from appending to the destination URL?1 -
Why Can't Googlebot Fetch Its Own Map on Our Site?
I created a custom map using google maps creator and I embedded it on our site. However, when I ran the fetch and render through Search Console, it said it was blocked by our robots.txt file. I read in the Search Console Help section that: 'For resources blocked by robots.txt files that you don't own, reach out to the resource site owners and ask them to unblock those resources to Googlebot." I did not setup our robtos.txt file. However, I can't imagine it would be setup to block google from crawling a map. i will look into that, but before I go messing with it (since I'm not familiar with it) does google automatically block their maps from their own googlebot? Has anyone encountered this before? Here is what the robot.txt file says in Search Console: User-agent: * Allow: /maps/api/js? Allow: /maps/api/js/DirectionsService.Route Allow: /maps/api/js/DistanceMatrixService.GetDistanceMatrix Allow: /maps/api/js/ElevationService.GetElevationForLine Allow: /maps/api/js/GeocodeService.Search Allow: /maps/api/js/KmlOverlayService.GetFeature Allow: /maps/api/js/KmlOverlayService.GetOverlays Allow: /maps/api/js/LayersService.GetFeature Disallow: / Any assistance would be greatly appreciated. Thanks, Ruben
Technical SEO | | KempRugeLawGroup1 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
What is the best way to find missing alt tags on my site (site wide - not page by page)?
I am looking to find all the missing alt tags on my site at once. I have a FF extension that use to do it page by page, but my site is huge and that will take forever. Thanks!!
Technical SEO | | franchisesolutions1 -
Javascript to manipulate Google's bounce rate and time on site?
I was referred to this "awesome" solution to high bounce rates. It is suppose to "fix" bounce rates and lower them through this simple script. When the bounce rate goes way down then rankings dramatically increase (interesting study but not my question). I don't know javascript but simply adding a script to the footer and watch everything fall into place seems a bit iffy to me. Can someone with experience in JS help me by explaining what this script does? I think it manipulates the reporting it does to GA but I'm not sure. It was supposed to be placed in the footer of the page and then sit back and watch the dollars fly in. 🙂
Technical SEO | | BenRWoodard1 -
When is the last time Google crawled my site
How do I tell the last time Google crawled my site. I found out it is not the "Cache" which I had thought it was.
Technical SEO | | digitalops0 -
Delete old site but redirect domain to a new domain and site
I just have a quick query and I have a feeling about what the answer is so just wanted to see what you guys thought... Basically I am working on a client site. This client has a few other websites that are divisions of their company. However these divisions/websites are no longer used. They are wanting to delete the websites but redirect the domains to their name main website. They believe this will pass on SEO benefits as these old division sites are old and have a good PR and history. I'm unsure for DEFINITE, which way is correct?
Technical SEO | | Weerdboil0