Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the Redirect Rule for corresponding https urls to new domain with the same https urls?
2 sites have the same urls but the owner wants just the 1 site. So I will be doing a 301 redirect with .htaccess from https://www.example.co.uk/sportsbook/SOCCER/today/ redirecting to https://www.example.com//sportsbook/SOCCER/today/ There are a lot of urls that are the same, so I was wondering what the rule is to put in the file please that will change them all to the corresponding urls? Would this be correct?... RewriteEngine on
Intermediate & Advanced SEO | | WSIDW
RewriteCond %{HTTPS_HOST} ^example.co.uk [NC,OR]
RewriteCond %{HTTPS_HOST} ^www.example.co.uk [NC]
RewriteRule ^(.*)$ https://example.com$1 [L,R=301,NC] Or would a simple rule like this work... redirect 301 / http://www.new domain.com/ If not correct could you please give me the correct rule, thanks! Then of course doing a change of address of address in webmaster tools after. Also... do I still need to do the forwarding from the https://www.example.co.uk/ domain provider after as well? Many thanks for your help in advance.0 -
One site, two blogs, URL structure?
I address a two sided market: consumer research and school fundraising. Essentially parents answer research surveys to generate proceeds for their school. My site will have a landing page at www.centiment.co that directs users to two different sub-landing pages, one related to research and one related to school fundraising. I am going to create two blogs and I am wondering if I should run off one installation of wordpress.org or two? The goal here is to optimize SEO. Separate URL paths by topic are clean but they require two installations of wordpress.org www.centiment.co/research/blog www.centiment.co/fundraising/blog If were to use one installation of wordpress it would be www.centiment.co/blog and then I would have a category for fundraising and a category for research. This is a little simpler. My concern is that it will confuse google and damage my SEO given general blog posts about fundraising are far different then those about research. Any suggestions? Again I don't want to compromise my SEO as I'm creating a blog to improve my SEO. Any insights are much appreciated. Thank you!
Intermediate & Advanced SEO | | kurtw14
Kurt0 -
Domain Migration of high traffic site:
We plan to perform a domain migration in 6 months time.
Intermediate & Advanced SEO | | lcourse
I read the different articles on moz relating to domain migration, but some doubts remain: Moving some linkworthy content upfront to new domain was generally recommended. I have such content (free e-learning) that I could move already now to new domain.
Should I move it now or just 2 months before migration?
Should I be concerned whether this content and early links could indicate to google a different topical theme of the new domain ? E.g. in our case free elearning app vs a commercial booking of presential courses of my core site which is somehow but not extremely strongly related) and links for elearning app may be very specific from appstores and from sites about mobile apps. we still have some annoying .php3 file extensions in many of our highest traffic pages and I would like to drop the file-extension (no further URL change). It was generally recommended to minimize other changes at the same time of domain migration, but on the other hand implementing later another 301 again may also not be optimum and it would save time to do it all at the same time. Shall I do the removal of the file extension at the same time of the domain migration or rather schedule it for 3 months later? On the same topic, would the domain migration be a good occasion to move to https instead of http at the same time, or also should we rather do this at a different time? Any thoughts or suggestions?0 -
New site causes massive drop off in ranking, old site restored how long to recover?
Hello, We launched and updated version of our site, mainly design changes and some functionality. 3 days after the launch we vanished from the rankings, previous page one results were now out of the top 100. We have identified some of the issues with the new site and chose to restore the old well ranking site. My question is how long might it take for the ranking to come back, if at all? The drop happened on the third day and the site was restored on the third day. We are now on day 6. Using GWT with have used fetch as Google and resubmitted the site map. Any help would be gladly received. Thanks James
Intermediate & Advanced SEO | | JamesBryant0 -
Expired urls
For a large jobs site, what would be the best way to handle job adverts that are no longer available? Ideas that I have include: Keep the url live with the original content and display current similar job vacancies below - this has the advantage of continually growing the number of indexed pages. 301 redirect old pages to parent categories - this has the advantage of concentrating any acquired link juice where it is most needed. Your thoughts much appreciated.
Intermediate & Advanced SEO | | cottamg0 -
Why does this site rank above us?
We own www.discountbannerprinting.co.uk and over the last 8 months have built some decent guest post, charity and customer links but still we seem to be beaten on good words such as banners, banner, vinyl banner, pvc banner etc by this website www-signfirm.com we just cannot figure out how this is happening and would be very grateful if someone with great wisdom could give us an in-site into why this is happening and we would be very grateful..
Intermediate & Advanced SEO | | BobAnderson0 -
High ranked web site on Google GONE - but webspam team says nothing wrong
We purchased several weeks ago a .org blog that has been highly ranked (number 1 on competetive keywords) for at least a year. it is a blog We moved the blog to our IP range and it went from #1 on top keyword and first page on another to the home page just gone. Now there was a secondary page indexed that stayed on page 5 for the keyword the home page was ranked #1 but the home page (which was high ranked page is just gone) We wrote the Google Webmaster team for reconsideration but they wrote back and said the web spam team said nothing wrong. A contact of mine who works for one of the most well known SEO compaines in the world says because we moved it the site could disappear for a week or so but the "algos would realize" and return it to that top spot soon. Does anyone know anything about moving a site to new IP and issues that can result?
Intermediate & Advanced SEO | | TBKO0 -
Any SEO suggestions for my site?
Site in question: http://bit.ly/Lcspfp Does anyone have any suggestions for any on-site SEO that would benefit my website? Any recommendations, big or small are appreciated.
Intermediate & Advanced SEO | | RichardTaylor1