Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Speed Testing Tools For Production Sites
Hi Guys, Any free site speed testing tools for sites in production, which are password protected? We want to test site speed before the new site goes live on top priority pages. Site is on Shopify – we tried google page insights while being logged into the production site but believe its just recording the speed of the password page. Cheers.
Intermediate & Advanced SEO | | brandonegroup1 -
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Redirect old "not found" url (at http) to new corresponding page (now at https)
My least favorite part of SEO 😉 I'm trying to redirect an old url that no longer exists to our new website that is built with https. The old url: http://www.thinworks.com/palm-beach-gardens-team/ New url: https://www.thinworks.com/palm-beach-gardens/ This isn't working with my standard process of the quick redirection plugin in WP or through htaccess because the old site url is at http and not https. Any help would be much appreciated! How do I accomplish this, where do I do it and what's the code I'd use? Thank you Moz community! Ricky
Intermediate & Advanced SEO | | SUCCESSagency0 -
What to do about old urls that don't logically 301 redirect to current site?
Mozzers, I have changed my site url structure several times. As a result, I now have a lot of old URLs that don't really logically redirect to anything in the current site. I started out 404-ing them, but it seemed like Google was penalizing my crawl rate AND it wasn't removing them from the index after being crawled several times. There are way too many (>100k) to use the URL removal tool even at a directory level. So instead I took some advice and changed them to 200, but with a "noindex" meta tag and set them to not render any content. I get less errors but I now have a lot of pages that do this. Should I (a) just 404 them and wait for Google to remove (b) keep the 200, noindex or (c) are there other things I can do? 410 maybe? Thanks!
Intermediate & Advanced SEO | | jcgoodrich0 -
Why is this site not indexed by Google?
Hi all and thanks for your help in advance. I've been asked to take a look at a site, http://www.yourdairygold.ie as it currently does not appear for its brand name, Your Dairygold on Google Ireland even though it's been live for a few months now. I've checked all the usual issues such as robots.txt (doesn't have one) and the robots meta tag (doesn't have them). The even stranger thing is that the site does rank on Yahoo! and Bing. Google Webmaster Tools shows that Googlebot is crawling around 150 pages a day but the total number of pages indexed is zero. It does appear if you carry out a site: search on Google however. The site is very poorly optimised in terms of title tags, unnecessary redirects etc which I'm working on now but I wondered if you guys had any further insights. Thanks again for your help.
Intermediate & Advanced SEO | | iProspect-Ireland0 -
Redirecting Pages from site A to site B
Hi, I have a client who have a solid, high ranking content based site (site A). They have now created an ecommerce site in addition (site B). To give site B a boost in terms of search engine visibility upon launch, they now wish to redirect approx 90% of site As pages to site B. What would be the implications of this? Apart from customers being automatically redirected from the page they thought they where landing on, how would google now view site A? What are your thoughts to thier idea. I am trying to talk them out of it as I think its a poor one.
Intermediate & Advanced SEO | | Webrevolve0 -
Micro sites?
Hi, I have been speaking to seo firms regarding strategies and they mentioned setting up micro sites under domains that are relevant. i.e setting up armanidoamin.co.uk and we use it as a blog type site to update all info, product reviews, news relating to armani. Whats peoples thoughts on this? Does it work? Is it worth the effort? Im not so sure but obviously looking for ideas. Cheers
Intermediate & Advanced SEO | | YNWA0 -
Can your site be penalized for changing the url structure and if so how long till you get back?
I'm doing well on yahoo and bing and the only reason I can think of for why I'm not showing on Google is because I changed the url structure a couple of months ago. I have solid on and off page done for this site.
Intermediate & Advanced SEO | | deciph220