Googlebot found an extremely high number of URLs on your site

BenFox

I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.

The error is as below-

Googlebot encountered problems while crawling your site.

Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;

No-index a large number of pages using the on page meta tag.
Use a canonical tag where it is appropriate

But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.

So my question is how do I address this problem?

I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.

any suggestions appreciated.

Myntra

I feel we are missing some information here.

For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".

The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..

Dr-Pete

Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.

It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.

Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.

BenFox

I was afraid that this might be the case.

Thanks for the help.

donford

Hi Ben,

You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.

NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.

Here is a direct quote from Matt Cutts about NOINDEX:

"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....

REF: http://www.mattcutts.com/blog/google-noindex-behavior/

The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.

I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.

Hope it helps.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Googlebot found an extremely high number of URLs on your site

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Mobile Site Annotations

Complex URL Migration

Large Site - Complete Site URL Change and How to Preserver Organic Rankings/Traffic

Duplicate content when changing a site's URL due to algorithm penalty

Google Phone Numbers

This site got hit but why..?

URL blocked

Exact keyword URL or not?