Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site migration/ CMS/domain site structure change-no access to search console
Hi everyone, We are migrating an old site under a bigger umbrella (our main domain). As mentioned in the title, We'll perform CMS migration, domain change, and site structure change. Now, the major problem is that we can't get into google search console for the old site. The site still has old GA code, so google search console verification using this method is not possible, also there is no way developers will be able to add GTM or edit DNS setting (not to bother you with the reason why). Now, my dilemma is : 1. Do we need access to old search console to notify Google about the domain name change or this could be done from our main site (old site will become a part of) search console 2. We are setting up 301 redirects from old to the new domain (not perfect 1:1 redirect ). Once migration is done does anything else needs to be done with the old domain (it will become obsolete)? 3.The main site, Site-map... Should I create a new sitemap with newly added pages or update the current one. 4. if you have anything else please add:) Thank you!
Intermediate & Advanced SEO | | bgvsiteadmin0 -
Site with both subfolders and subdomains
Hi everyone,
Intermediate & Advanced SEO | | medi_
I'm working on a website that has a quite extensive subfolder structure for product and multilingual purposes.
domain.com/en
domain.com/it
domain.com/fr
domain.com/en/category
domain.com/it/category
domain.com/fr/category
domain.com/en/category/product
domain.com/it/category/product
domain.com/fr/category/product
domain.com/en/category/product/region
domain.com/it/category/product/region
domain.com/fr/category/product/region
and so on... We will soon be launching a completely different service, which would make the subfolder structure become even more complex. As John Mueller recently stated that Subdomains and Subfolders are treated the same by Google, I am now considering building that new service under subdomains for product reason, and for the sake of clarity. 1- Would my subdomains inherit the authority of my main domain?
2- Do I have to keep the language folders with the subdomain structure?
e.g.:
new-service.domain.com/en
nouveau-service.domain.com/fr
nuovo-servizio.domain.com/it OR
new-service.domain.com
nouveau-service.domain.com
nuovo-servizio.domain.com Looking forward to reading you!0 -
Merging Two Unrelated Sites into a Third Site
We have a new client interested in possibly merging 2 sites into one under the brand of a new parent company. Here's a breakdown of the scenario..... BrandA.com sells a variety of B2B widget-services via their online store. BrandB.com sells a variety of B2B thing-a-majig products and services (some of them large in size) not sold through an online store. These are sold more consultatively via a sales team. The new parent company, BrandA-B.com is considering combining the two sites under the new brand parent company domain. The Widget-services and Thing-A-Majigs have very little similarity or purchase crossover; so just because you're interested in one doesn't make you a good candidate for the other. We feel pretty confident that we can round-up all the necessary pages and inbound links to do proper transitioning to a new, separate third domain though we're not in agreement that this is the best course of action. Currently the individual brand sites are fairly well known in their industry and each ranks fairly well for a variety of important terms though there is room for improvement and each site has good links with the exception of the new site which has considerably fewer. BrandA.com DA = 73 - 19 years old
Intermediate & Advanced SEO | | OPM
BrandB.com DA = 55 - 18 years old
BrandA-B.com DA = 40 - 1 year old Our SEO team members have opinions on what the potential outcome(s) of this would be but are wondering what the community here thinks. Will the combining of the sites cause a dilution of the topics of the two sites and hurt rankings? Will the combining of the domain authority help one set part of the business but hurt the other? What do you think? What would you do?0 -
Why my site not ranking
Hello everyone, can anyone suggest me, where i am having problem in my site www.suntechengineers.com, i know content is less,
Intermediate & Advanced SEO | | poojathakar
but any other things that i am missing in my site? Is There any on page query please let me know, i need urgently getting up this,please help thanx in advance0 -
It appears that Googlebot Mobile will look for mobile redirects from the desktop site, but still use the SEO from the desktop site.
Is the above statement correct? I've read that its better to have different SEO titles & descriptions for mobile sites as users search differently on mobile devices. I've also read it's good to link build, keep text content on mobile sites etc to get the mobile site to rank. If I choose to not have titles & descriptions on my mobile site will Google just rank our desktop version & then redirect a user on a mobile device to our mobile site or should I be adding in titles & descriptions into the mobile site? Thanks so much for any help!
Intermediate & Advanced SEO | | DCochrane0 -
URL for New Product
Hi, We are creating a section on our established existing website to display our new marketplace product & associated category pages. This marketplace will be a section of the site where our users can sell online training courses that they've created. It will be branded on our site as the Marketplace. Is it important to include 'marketplace' in the URL? Or would it be better to include a relevant keyword such as 'training-courses' instead? Or both? I've assumed I shouldn't use both as that would increase the length of the URLs and number of subfolders.
Intermediate & Advanced SEO | | mindflash0 -
Press Release Sites
Ok, I am getting a lot of conflicting information about press release sites. i have been doing press release's for a while (mostly manually), I have also tried a few companies that claim to do it well (never do). After the Panda update the PR sites I have been using are just not as effective. Does anyone else have this problem or are there better PR sites that can be recommended.
Intermediate & Advanced SEO | | TomBarker820 -
Migrating a site
Hello, I have what a I think it's a noob question.. I have a medium size website and need to put it into maintenance for the next 2 months, and afterwards activate a completly new site. My client asked me to do this, cause the same people whoe run the constant flow of information on the site, are the ones who are going to develop the new site, so he wants to just close it out So... what are the steps for doing this with minimum impact on any SEO advances made this past months?.. How do I tell the search engines, Hey, just under maintenance for a while....then... i'm back in the game but this is my new structure. and the old one should go here
Intermediate & Advanced SEO | | daniel.alvarez0