Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL structure for SEO
Hi Mozzers, I have a site which is a combination of product pages, and news and advice pages that relate to the products. How would you approach the URL structure for this, following SEO best practice? Approach 1 Product pages:
Intermediate & Advanced SEO | | A_Q
www.website.com/product-category/product-page News and advice pages:
www.website.com/product-category/product-page/news-and-advice-story-1
www.website.com/product-category/product-page/news-and-advice-story-2
etc or Approach 2 Product pages:
www.website.com/product-category/product-page News and advice pages:
www.website.com/news/product-category/news-and advice-story-1 (with internal linking to relevant product page)
www.website.com/news/product-category/news-and advice-story-2 (with internal linking to relevant product page)
etc Or would a different approach be better?0 -
Why is this url redirecting to our site?
I was doing an audit on our site and searching for duplicate content using some different terms from each of our pages. I came across the following result: www.sswug.org/url/32639 redirects to our website. Is that normal? There are hundreds of these url's in google all with the exact same description. I thought it was odd. Any ideas and what is the consequence of this?
Intermediate & Advanced SEO | | Sika220 -
Will Canonical tag on parameter URLs remove those URL's from Index, and preserve link juice?
My website has 43,000 pages indexed by Google. Almost all of these pages are URLs that have parameters in them, creating duplicate content. I have external links pointing to those URLs that have parameters in them. If I add the canonical tag to these parameter URLs, will that remove those pages from the Google index, or do I need to do something more to remove those pages from the index? Ex: www.website.com/boats/show/tuna-fishing/?TID=shkfsvdi_dc%ficol (has link pointing here)
Intermediate & Advanced SEO | | partnerf
www.website.com/boats/show/tuna-fishing/ (canonical URL) Thanks for your help. Rob0 -
Google Phone Numbers
What process is performed to get a company's phone number to show as "A" on google maps. Google displays the phone number for the company on the map as "A" first. It would be beneficial to get that position. Is there a sub-category of seo that does this? Thanks in advance!
Intermediate & Advanced SEO | | JML11790 -
Sudden increase in number of indexed URLs. How ca I know what URLs these are?
We saw a spike in the total number of indexed URLs (17,000 to 165,000)--what would be the most efficient way to find out what the newly indexed URLs are?
Intermediate & Advanced SEO | | nicole.healthline0 -
High ranked web site on Google GONE - but webspam team says nothing wrong
We purchased several weeks ago a .org blog that has been highly ranked (number 1 on competetive keywords) for at least a year. it is a blog We moved the blog to our IP range and it went from #1 on top keyword and first page on another to the home page just gone. Now there was a secondary page indexed that stayed on page 5 for the keyword the home page was ranked #1 but the home page (which was high ranked page is just gone) We wrote the Google Webmaster team for reconsideration but they wrote back and said the web spam team said nothing wrong. A contact of mine who works for one of the most well known SEO compaines in the world says because we moved it the site could disappear for a week or so but the "algos would realize" and return it to that top spot soon. Does anyone know anything about moving a site to new IP and issues that can result?
Intermediate & Advanced SEO | | TBKO0 -
Do I have to tell WBT site moved to a subdirectory on another internal site?
I am moving content from one site to another and redirecting the DNS from www.oldsite.com to www.newsite.com/old-site. I have put the 301 in place but I wanted to make sure I have to also tell Webmaster Tools to change the old site to the new domain? We still want the old domain name to answer and redirect to www.newsite.com/old-site. Thanks
Intermediate & Advanced SEO | | GeorgeLaRochelle0 -
How would you fix this site?
We're currently in the IA and design phase of rolling out a complete overhaul of our main site. In the meantime I've been doing some SEO triage, but I wanted to start making a longer term plan for SEO during and after the new site goes up. We have a pretty decent domain authority, and some quality backlinks, but we're just getting creamed in the SERPs. And so on to my question: How would you fix this site? What SEO strategy would you employ? http://www.adoptionhelp.org Thanks!
Intermediate & Advanced SEO | | AdoptionHelp0