Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Mobile Site Annotations
Our company has a complex mobile situation, and I'm trying to figure out the best way to implement bidirectional annotations and a mobile sitemap. Our mobile presence consists of three different "types" of mobile pages: Most of our mobile pages are mobile-specific "m." pages where the URL is completely controlled via dynamic parameter paths, rather than static mobile URLs (because of the mobile template we're using). For example: http://m.example.com/?original_path=/directory/subdirectory. We have created vanity 301 redirects for the majority of these pages, that look like http://m.example.com/product that simply redirect to the previous URL. Six one-off mobile pages that do have a static mobile URL, but are separate from the m. site above. These URLs look like http://www.example.com/product.mobile.html Two responsively designed pages with a single URL for both mobile and desktop. My questions are as follows: Mobile sitemap: Should I include all three types of mobile pages in my mobile sitemap? Should I include all the individual dynamic parameter m. URLs like http://m.example.com/?original_path=/directory/subdirectory in the sitemap, or is that against Google's recommendations? Bidirectional Annotations: We are unable to add the rel="canonical" tag to the m. URLs mentioned in section #1 above because we cannot add dynamic tags to the header of the mobile template. We can, however, add them to the .mobile.html pages. For the rel="alternate" tags on the desktop versions, though, is it correct to use the dynamic parameter URLs like http://m.example.com/?original_path=/directory/subdirectory as the mobile version target for the rel="alternate" tag? My initial thought is no, since they're dynamic parameter URLs. Is there even any benefit to doing this if we can't add the bidirectional rel="canonical" on those same m. dynamic URLs? I'd be immensely grateful for any advice! Thank you so much!
Intermediate & Advanced SEO | | Critical_Mass0 -
Complex URL Migration
Hi There, I have three separate questions which are all related. Some brief back ground. My client has an adventure tourism company that takes predominantly North American customers on adventure tours to three separate destinations: New Zealand, South America and the Himalayas. They previously had these sites on their own URL's. These URL's had the destination in the URL (eg: sitenewzealand.com). 2 of the three URL's had good age and lots of incoming links. This time last year a new web company was bought in and convinced them to pull all three sites onto a single domain and to put the sites under sub folders (eg: site.com/new-zealand). The built a brand new site for them on a Joomla platform. Unfortunately the new sites have not performed and halved the previous call to action rates. Organic traffic was not adversely affected with this change, however it hasn't grown either. I have been overhauling these new sites with a project team and we have managed to keep the new design but make usability/marketing changes that have the conversion rate nearly back to where it originally was and we have managed to keep the new design (and the CMS) in place. We have recently made programmatic changes to the joomla system to push the separate destination sites back onto their original URL's. My first question is around whether technically this was a good idea. Question 1 Does our logic below add up or is it flawed logic? The reasons we decided to migrate the sites back onto their old URL's were: We have assumed that with the majority of searches containing the actual destination (eg: "New Zealand") that all other things being equal it is likely to attract a higher click through rate on the domain www.sitenewzealand.com than for www.site.com/new-zealand. Having the "newzealand" in the actual URL would provide a rankings boost for target keyword phrases containing "new zealand" in them. We also wanted to create the consumer perception that we are specialists in each of the destinations which we service rather than having a single site which positions us as a "multi-destination" global travel company. Two of the old sites had solid incoming links and there has been very little new links acquired for the domain used for the past 12 months. It was also assumed that with the sites on their own domains that the theme for each site would be completely destination specific rather than having the single site with multiple destinations on it diluting this destination theme relevance. It is assumed that this would also help us to rank better for the destination specific search phrases (which account for 95% of all target keyword phrases). The downsides of this approach were that we were splitting out content onto three sites instead of one with a presumed associated drop in authority overall. The other major one was the actual disruption that a relatively complex domain migration could cause. Opinions on the logic we adopted for deciding to split these domains out would be highly appreciated. Question 2 We migrated the folder based destination specific sites back onto their old domains at the start of March. We were careful to thoroughly prepare the htaccess file to ensure we covered off all the new redirects needed and to directly redirect the old redirects to the new pages. The structure of each site and the content remained the same across the destination specific folders (eg: site.com/new-zealand/hiking became sitenewzealand.com/hiking). To achieve this splitting out of sites and the ability to keep the single instance of Joomla we wrote custom code to dynamically rewrite the URL's. This worked as designed. Unfortunately however, Joomla had a component which was dynamically creating the google site maps and as this had not had any code changes it got all confused and started feeding up a heap of URL's which never previously existed. This resulted in each site having 1000 - 2000 404's. It took us three weeks to work this out and to put a fix into place. This has now been done and we are down to zero 404's for each site in GWT and we have proper google site maps submitted (all done 3 days ago). In the meantime our organic rankings and traffic began to decline after around 5 days (after the migration) and after 10 days had dropped down to around 300 daily visitors from around 700 daily visitors. It has remained at that level for the past 2 weeks with no sign of any recovery. Now that we have fixed the 404's and have accurate site maps into google, how long do you think it will take to start to see an upwards trend again and how long it is likely to take to get to similar levels of organic traffic compared to pre-migration levels? (if at all). Question 3 The owner of the company is understandably nervous about the overall situation. He is wishing right now that we had never made the migration. If we decided to roll back to what we previously had are we likely to cause further recovery delays and would it come back to what we previously had in a reasonably quick time frame? A huge thanks to everyone for reading what is quite a technical and lengthy post and a big thank you in advance for any answers. Kind Regards
Intermediate & Advanced SEO | | activenz
Conrad0 -
Large Site - Complete Site URL Change and How to Preserver Organic Rankings/Traffic
Hello Community, What is your experience with site redesign when it comes to preserving the traffic? If a large enterprise website has to go through a site-wide enhancement (resulting in change of all URLs and partial content), what do you expect from Organic rankings and traffic? I assume we will experience a period that Google needs to "re-orientate" itself with the new site, if so, do you have similar experience and tips on how to minimize the traffic loss? Thanks
Intermediate & Advanced SEO | | b.digi0 -
Duplicate content when changing a site's URL due to algorithm penalty
Greetings A client was hit by penguin 2.1, my guess is that this was due to linkbuilding using directories. Google webmaster tools has detected about 117 links to the site and they are all from directories. Furthermore, the anchor texts are a bit too "perfect" to be natural, so I guess this two factors have earned the client's site an algorithm penalty (no manual penalty warning has been received in GWT). I have started to clean some of the backlinks, on Oct the 11th. Some of the webmasters I asked complied with my request to eliminate backlinks, some didn´t, I disavowed the links from the later. I saw some improvements on mid october for the most important KW (see graph) but ever since then the rankings have been falling steadily. I'm thinking about giving up on the domain name and just migrating the site to a new URL. So FINALLY MY QUESTION IS: if I migrate this 6-page site to a new URL, should I change the content completely ? I mean, if I just copy paste the content of the curent site into a new URL I will incur in dpolicate content, correct?. Is there some of the content I can copy ? or should I just start from scratch? Cheers hRggeNE
Intermediate & Advanced SEO | | Masoko-T0 -
Google Phone Numbers
What process is performed to get a company's phone number to show as "A" on google maps. Google displays the phone number for the company on the map as "A" first. It would be beneficial to get that position. Is there a sub-category of seo that does this? Thanks in advance!
Intermediate & Advanced SEO | | JML11790 -
This site got hit but why..?
I am currently looking at taking on a small project website which was recently hit but we are really at a loss as to why so I wanted to open this up to the floor and see if anyone else had some thoughts or theories to add. The site is Howtotradecommodities.co.uk and the site appeared to be hit by Penguin because sure enough it drops from several hundred visitors a day to less than 50. Nothing was changed about the website, and looking at the Analytics it bumbled along at a less than 50 visitors a day. On June 25th when Panda 3.8 hit, the site saw traffic increase to between 80-100 visitors a day and steadily increases almost to pre-penguin levels. On August 9th/10th, traffic drops off the face of the planet once again. This site has some amazing links http://techcrunch.com/2012/02/04/algorithmsdata-vs-analystsreports-fight/
Intermediate & Advanced SEO | | JamesAgate
http://as.exeter.ac.uk/library/using/help/business/researchingfinance/stockmarket/ That were earned entirely naturally/editorially. I know these aren't "get out of jail free cards" but the rest of the profile isn't that bad either. Normally you can look at a link profile and say "Yep, this link and that link are a bit questionable" but beyond some slightly off-topic guest blogging done a while back before I was looking to get involved in the project there really isn't anything all that fruity about the links in my opinion. I know that the site design needs some work but the content is of a high standard and it covers its topic (commodities) in a very comprehensive and authoritative way. In my opinion, (I'm not biased yet because it isn't my site) this site genuinely deserves to rank. As far as I know, this site has received no unnatural link warnings. I am hoping this is just a case of us having looked at this for too long and it will be a couple of obvious/glaring fixes to someone with a fresh pair of eyes. Does anyone have any insights into what the solution might be? [UPDATE] after responses from a few folks I decided to update the thread with progress I made on investigating the situation. After plugging the domain into Open Site Explorer I can see quite a few links that didn't show up in Link Research Tools (which is odd as I thought LRT was powered by mozscape but anyway... shows the need for multiple tools). It does seem like someone in the past has been a little trigger happy with building links to some of the inner pages.0 -
URL blocked
Hi there, I have recently noticed that we have a link from an authoritative website, however when I looked at the code, it looked like this: <a <span="">href</a><a <span="">="http://www.mydomain.com/" title="blocked::http://www.mydomain.com/">keyword</a> You will notice that in the code there is 'blocked::' What is this? has it the same effect as a nofollow tag? Thanks for any help
Intermediate & Advanced SEO | | Paul780 -
Exact keyword URL or not?
Hi all, I have a quick question about the proper use of permalinks. Let's say that I have a website about sports and I want to create an internal page dedicated to shoes. I know that the keyword "shoe" has 15.000 monthly visits, while the keyword "shoes" has 1.000 monthly visits. How do I have to name the internal page? http://www.example.com/shoe or http://www.example.com/shoes (with a final 's')? I would think that by naming the URL http://www.example.com/shoes, the search engine would consider that page for the keywords "shoe" and "shoes", but I am not sure about it. Should I create a URL that only focuses on one specific keyword ("shoe", in this example) or a URL that may encompass more than one keyword ("shoe" and "shoes")? I hope this is clear. Thank you for your time and help. All best, Sal
Intermediate & Advanced SEO | | salvyy0