Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirecting an Entire Site to a Page on Another Site?
So I have a site that I want to shut down http://vowrenewalsmaui.com and redirect to a dedicated Vow Renewals page I am making on this site here: https://simplemauiwedding.net. My main question is: I don't want to lose all the authority of the pages and if I just redirect the site using my domain registrar's 301 redirect it will only redirect the main URL not all of the supporting pages, to my knowledge. How do I not lose all the authority of the supporting pages and still shut down the site and close down my site builder? I know if I leave the site up I can redirect all of the individual pages to corresponding pages on the other site, but I want to be done with it. Just trying to figure out if there is a better way than I know of. The domain is hosted through GoDaddy.
Intermediate & Advanced SEO | | photoseo10 -
How can a site rank high without any visible SEO tactics?
Hi SEO specialists, I don't know if this is the right place to ask this, but there is something bugging me for awhile now, so I thought let's give it a try. I work for several clients and one of them works in the Psychic business. Since a few months we see that site from Psychicvop.com is taking over all kinds of top rankings in Google. The strange thing is that I can't detect anything that I always learned which is basic material for SEO. There isn't much content and there link profile isn't really spectacular either. Does any of you have an idea what it is about this sites that makes them outrank every other site in the business?
Intermediate & Advanced SEO | | LiveFootballTickets0 -
Client wants to remove mobile URLs from their sitemap to avoid indexing issues. However this will require SEVERAL billing hours. Is having both mobile/desktop URLs in a sitemap really that detrimental to search indexing?
We had an enterprise client ask to remove mobile URLs from their sitemaps. For their website both desktop & mobile URLs are combined into one sitemap. Their website has a mobile template (not a responsive website) and is configured properly via Google's "separate URL" guidelines. Our client is referencing a statement made from John Mueller that having both mobile & desktop sitemaps can be problematic for indexing. Here is the article https://www.seroundtable.com/google-mobile-sitemaps-20137.html
Intermediate & Advanced SEO | | RosemaryB
We would be happy to remove the mobile URLs from their sitemap. However this will unfortunately take several billing hours for our development team to implement and QA. This will end up costing our client a great deal of money when the task is completed. Is it worth it to remove the mobile URLs from their main website to be in adherence to John Mueller's advice? We don't believe these extra mobile URLs are harming their search indexing. However we can't find any sources to explain otherwise. Any advice would be appreciated. Thx.0 -
We 410'ed URLs to decrease URLs submitted and increase crawl rate, but dynamically generated sub URLs from pagination are showing as 404s. Should we 410 these sub URLs?
Hi everyone! We recently 410'ed some URLs to decrease the URLs submitted and hopefully increase our crawl rate. We had some dynamically generated sub-URLs for pagination that are shown as 404s in google. These sub-URLs were canonical to the main URLs and not included in our sitemap. Ex: We assumed that if we 410'ed example.com/url, then the dynamically generated example.com/url/page1 would also 410, but instead it 404’ed. Does it make sense to go through and 410 these dynamically generated sub-URLs or is it not worth it? Thanks in advice for your help! Jeff
Intermediate & Advanced SEO | | jeffchen0 -
Transferring Domain and redirecting old site to new site and Having Issues - Please help
I have just completed a site redesign under a different domain and new wordpress woo commerce platform. The typical protocol is to just submit all the redirects via the .htaccess file on the current site and thereby tell google the new home of all your current pages on the new site so you maintain your link juice. This problem is my current site is hosted with network solutions and they do not allow access to the .htaccess file and there is no way to redirect the pages they say other than a script they can employ to push all pages of the old site to the new home page of the new site. This is of course bad for seo so not a solution. They did mention they could also write a script for the home page to redirect just it to the new home page then place a script of every individual page redirecting each of those. Does this sound like something plausible? Noone at network solutions has really been able to give me a straight answer. That being said i have discussed with a few developers and they mentioned a workaround process to avoid the above: “The only thing I can think of is.. point both domains (www.islesurfboards.com & www.islesurfandsup.com) to the new store, and 301 there? If you kept WooCommerce, Wordpress has plugins to 301 pages. So maybe use A record or CName for the old URL to the new URL/IP, then use htaccess to redirect the old domain to the new domain, then when that comes through to the new store, setup 301's there for pages? Example ... http://www.islesurfboards.com points to http://www.islesurfandsup.com ... then when the site sees http://www.islesurfboards.com, htaccess 301's to http://www.islesurfandsup.com.. then wordpress uses 301 plugin for the pages? Not 100% sure if this is the best way... but might work." Can anyone confirm this process will work or suggest anything else to redirect my current site on network solutions to my new site withe new domain and maintain the redirects and seo power. My domain www.islesurfboards.com has been around for 10 years so dont just want to flush the link juice down the toilet and want to redirect everything correctly.
Intermediate & Advanced SEO | | isle_surf0 -
Switching Url
I started working with a Roofer/Contractor about a year ago. His website is http://www.lancasterparoofing.com/. The name of his business is Spicher Home Improvements. He used to have spicherhomeimprovements.com, well he still does. He was focusing on Roofing and Siding but now would like to branch to other areas like Interior remodeling. So adding interior work under LancasterPaRoofing.com is not applicable. I do not think starting another domain and having two is the best option. I think he should go back to using SpicherHomeImprovements.com and I assume he would take a small hit but in time he should be better off. Plus the url is more applicable to the real name of his business. Thanks for any feedback I receive. Chad
Intermediate & Advanced SEO | | ChadEisenhart0 -
End of March we migrated our site over to HubSpot. We went from page 3 on Google to non existent. Still found on page 2 of Yahoo and Bing. Beyond frustrated...HELP PLEASE "www.vortexpartswashers.com"
End of March we migrated our site over to HubSpot. We went from page 3 on Google to non existent. Still found on page 2 of Yahoo and Bing under same keywords " parts washers" Beyond frustrated...HELP PLEASE "www.vortexpartswashers.com"
Intermediate & Advanced SEO | | mhart0 -
A Site in Flash to Optimize
Hello, I have to understand if this site www.spacemilanmodels.com.pt can be optimize since the entire website is in flash wich is not good for optimizacion. What do you guys suggest? Recommendations? Is it possible only with link-building? Tks for the help!
Intermediate & Advanced SEO | | PedroM0