Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Googlebot Crawl Rate causing site slowdown
-
I am hearing from my IT department that Googlebot is causing as massive slowdown/crash our site. We get 3.5 to 4 million pageviews a month and add 70-100 new articles on the website each day. We provide daily stock research and marke analysis, so its all high quality relevant content. Here are the crawl stats from WMT:
I have not worked with a lot of high volume high traffic sites before, but these crawl stats do not seem to be out of line. My team is getting pressure from the sysadmins to slow down the crawl rate, or block some or all of the site from GoogleBot.
Do these crawl stats seem in line with sites? Would slowing down crawl rates have a big effect on rankings?
Thanks
-
Similar to Michael, my IT team is saying Googlebot is causing performance issues - specifically during peak hours.
It was suggested that we consider using apache re-write rules to serve Googlebot a 503 during our peak hours to limit the impact. I found the stackoverflow thread (link below) in which John Muller seems to suggest this approach, but has anyone tried this?
-
Blocking googlebot is a quick and easy way to disappear from the Index. Not an option if you want Google to rank your site.
For smaller sites or ones with limited technologies, I sometimes recommend using a crawl-delay directive in robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=48620
But I agree with both Shane and Zachary, this doesn't seem like the long term answer to your problems. Your crawl stats don't seem out of line for a site of your size, and perhaps a better hardware configuration could help things out.
With 70 new articles each day, I'd want Google crawling my site as much as they pleased.
-
whatever Google's default is in GWT - It sets it for you.
You can change it, but it is not reccomended unless for a specific reason (such as Michael Lewis's specific scenario) even though, I am not completely sold that Gbot is what is causing the "dealbreaking" overhead.
-
what is the ideal setting on the crawler. i have been wondering about this for some time.
-
Hi,
Your admins saying that, is like someone saying "we need to shut the site down, we are getting to much traffic!" Common sys-admin response (fix it somewhere else)
4GB a day downloaded, is alot of Bot traffic, but it appears you are a "real time" site, that is probably actually helped and maybe even reliant on your high crawl rate....
I would upgrade hardware - or even look into some kind of off site cloud redundancy for failover (Hybrid)
I highly doubt that 4GB a day, is a "dealbreaker",but of course that is just based off the one image, and your admins probably have resource monitors - Maybe Varnish is an answer for static content to help lighten load???? Or CDN for file hosting to lighten bandwidth load?
Shane
-
We are hosting the site on our own hardware at a big colo. I know that we are upgrading servers but they will not be online until the end of July.
Thanks!
-
I wouldn't slow the crawl rate. A high crawl rate is good so that Google can keep their index of your website current.
The better solution is to reconsider your hardware and networking setup. Do you know how you are being hosted? From my own experience with a website of that size, a load balancer on two decent dedicated servers should handle the load without problems. Google crawling your pages shouldn't create noticeable overhead on the right setup.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
GoogleBot still crawling HTTP/1.1 years after website moved to HTTP/2
Whole website moved to https://www. HTTP/2 version 3 years ago. When we review log files, it is clear that - for the home page - GoogleBot continues to only access via HTTP/1.1 protocol Robots file is correct (simply allowing all and referring to https://www. sitemap Sitemap is referencing https://www. pages including homepage Hosting provider has confirmed server is correctly configured to support HTTP/2 and provided evidence of accessing via HTTP/2 working 301 redirects set up for non-secure and non-www versions of website all to https://www. version Not using a CDN or proxy GSC reports home page as correctly indexed (with https://www. version canonicalised) but does still have the non-secure version of website as the referring page in the Discovery section. GSC also reports homepage as being crawled every day or so. Totally understand it can take time to update index, but we are at a complete loss to understand why GoogleBot continues to only go through HTTP/1.1 version not 2 Possibly related issue - and of course what is causing concern - is that new pages of site seem to index and perform well in SERP ... except home page. This never makes it to page 1 (other than for brand name) despite rating multiples higher in terms of content, speed etc than other pages which still get indexed in preference to home page. Any thoughts, further tests, ideas, direction or anything will be much appreciated!
Technical SEO | | AKCAC1 -
WPEngine Causing Redirect Chain
Hi guys, Had a quick question that I wanted to verify here. After reviewing a Moz report we received some redirect chain error on all of our sites hosted with WPEngine. We noticed that the redirect chain appears to be coming from how the domains are configured in their control panel. Essentially, there is a redirect: from staging/temp -> to live from non-www -> to www SSL redirect from http -> https The issue here is that the non-www is redirecting to www and then redirected again to https://www According to support the only way to get rid of this error is to drop the www version of the domain and to host everything under https://domain.com. To me it seems very odd that you cannot just go from http://non-www to https://www in just 1 301 redirect. Has anyone else experienced this or am I just not looking at the situation correctly?
Technical SEO | | AaronHenry0 -
Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.
I have created a new website and need to redirect all of the previous pages to the new one. The old website was built in coldfusion and the new site is built in wordpress. One of the pages I'm trying to redirect is www.norriseal.com/products.cfm to http://norrisealwellmark.com/products/. This is what I have in my .htaccess file <ifmodule mod_rewrite.c="">Options +FollowSymlinks
Technical SEO | | MarketHubb
RewriteEngine On
RewriteBase /
Redirect 301 /products.cfm http://norrisealwellmark.com/products/</ifmodule> The result of this redirect is http://norrisealwellmark.com/products.cfm How do I prevent the .cfm from appending to the destination URL?1 -
Seo For Forum Sites
I have forum site.I've opened it 2 months ago.But there is a problem.Therefore my content is unique , my site's keyword ranking constantly changing..Sometimes my site's ranking drops from first 500.After came to 70s. I didn't make any off page seo to my site.What is the problem ?
Technical SEO | | tutarmi0 -
Staging site and "live" site have both been indexed by Google
While creating a site we forgot to password protect the staging site while it was being built. Now that the site has been moved to the new domain, it has come to my attention that both the staging site (site.staging.com) and the "live" site (site.com) are both being indexed. What is the best way to solve this problem? I was thinking about adding a 301 redirect from the staging site to the live site via HTACCESS. Any recommendations?
Technical SEO | | melen0 -
Is there a pinging tool to ping all sites at once
hi, i am just wondering if there is a tool that you can put on your toolbar that allows you to ping all the sites at once. The last thing i want to keep doing is to go through every single one and ping my article. I would like to find a tool that does it all for me, can anyone let me know if there is one out there. many thanks
Technical SEO | | ClaireH-1848860 -
How to allow googlebot past paywall
Does anyone know of any ways or ideas to allow Google/Bing etc. to index your content, but have it behind a paywall for users?
Technical SEO | | MirandaP0 -
Do we need to manually submit a sitemap every time, or can we host it on our site as /sitemap and Google will see & crawl it?
I realized we don't have a sitemap in place, so we're going to get one built. Once we do, I'll submit it manually to Google via Webmaster tools. However, we have a very dynamic site with content constantly being added. Will I need to keep manually re-submitting the sitemap to Google? Or could we have the continually updating sitemap live on our site at /sitemap and the crawlers will just pick it up from there? I noticed this is what SEOmoz does at http://www.seomoz.org/sitemap.
Technical SEO | | askotzko0