Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
-
Hi,
Any assistance would be greatly appreciated.
Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site".
This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc.
Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years.
We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what .
what is the best solution .?
A couple of example of our problems are here :
In Google type -
site:bestathire.co.uk inurl:"br"
You will see 107 results. This is one of many lot we need to get rid of.
Also -
site:bestathire.co.uk intitle:"All items from this hire company"
Shows 25,300 indexed pages we need to get rid of
Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical.
As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ?
Any advice on both points greatly appreciated?
thanks
Sarah.
-
Ahhh, I see what you mean now. Yes, good idea .
Will get that implement to.
Yes, everything is duplicated.It's all the same apart from the url which seems to be bringing in to different locations instead of one.
Odd url Generated(notice it has 2 locations in it)
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250/Alfreton
Correct location specific urls -
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Alfreton/250
thanks
Sarah.
-
Since (I assume this is what is happening) your ecommerce platform is duplicating the entire page, code and all, and putting it at these new URLs, having the canonical tag of the original page URL in the code for the right/real page will mean that, when it gets duplicated, the canonical tag will get duplicated as well and point back to the original URL. Make sense?
Can you talk to your ecommerce platform provider? This can't be an intended feature!
-
Thanks Ruth for the very comprehensive answer. Greatly Appreciated !.
Just to clarify your suggestion about the Rel=Canonical tag. Put it on the preferred pages . When the duplicate odd urls get generated, they Wont have a canonical tag so google will know there are not the original page ?.. Is that correct.
Sorry I just got a bit confused as you said the duplicate pages will have a concanical tag as well ?
As for the existing pages, they are very recent so wouldn't assume they would have any pr to warrent a 301 as opposed to a 404 but guess either would be ok.
Also adding the Meta name no index tag as you suggested to sounds very wise so will get that done to.
We also can't find how these urls were created and then indexed so just hoping a debug file we just created may shed some light.
Will keep you posted....
Many thanks
Sarah
-
Oh how frustrating!
There are a couple of things that you can do. Updating your robots.txt is a good start since the next time your site is crawled, Google should find that and drop at least some of the offending pages from the index. I would also go in to every page of your site and add in a rel=canonical tag to the original version of the URL. That way, even if your ecommerce platform is generating odd versions of the URL, that canonical tag will be on the duplicate versions letting engines know they're not the original page.
For the existing pages, you could just 301 them all back to the original versions, or add the canonical tag pointing back to the original versions. I would also add the tag to these pages to let Google know not to include them in the index.
With pagination and canonicalization there are a few different approaches, and each has its pros and cons. Dr. Pete wrote a really great post on canonicalization that just went out, you can read it here: http://www.seomoz.org/blog/which-page-is-canonical. I also recommend reading Adam Audette's post on pagination options at Search Engine Land: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284. I hope that helps!
-
As long as you think the sitemap is done right it should be fine.
-
Yes we submitted mini site maps to webmaster originally a couple of months back as our site is 60K pages so we broke is down to categories it etc.
We have not submitted a new map since finding this problem.
We are in the process of using the sitemap generator to generator new site map to see if it picks up anything usual.
Are u suggesting to resubmit ?
thanks
Sarah
-
In the short term I would definitely use canonicals to let Google know which are the right pages until you can fix your problem. Also, have you submitted a sitemap to Webmasters?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why a certain URL ( a category URL ) disappears?
the page hasn't been spammed. - links are natural - onpage grader is perfect - there are useful high ranking articles linking to the page...pretty much everything is okay.....also all of my websites pages are okay and none of them has disappeared only this one ( the most important category of my site. )
Intermediate & Advanced SEO | | mohamadalieskandariii0 -
Should m-dot sites be indexed at all
I have a client with a site with a m-dot mobile version. They will move it to a responsive site sometime next year but in meanwhile I have a massive doubt. This m-dot site has some 30k indexed pages in Google. Each of this page is bidirectionally linked to the www. version (rel="alternate on the www, rel canonical on the m-dot) There is no noindex on the m-dot site, so I understand that Google might decide to index the m-dot pages regardless of the canonical to the www site. But my doubts stays: is it a bad thing that both the version are indexed? Is this having a negative impact on the crawling budget? Or risking some other bad consequence? and how is the mobile-first going to impact on this? Thanks
Intermediate & Advanced SEO | | newbiebird0 -
Index an URL without directly linking it?
Hi everyone, Here's a duplicate content challenge I'm facing: Let's assume that we sell brown, blue, white and black 'Nike Shoes model 2017'. Because of technical reasons, we really need four urls to properly show these variations on our website. We find substantial search volume on 'Nike Shoes model 2017', but none on any of the color variants. Would it be theoretically possible to show page A, B, C and D on the website and: Give each page a canonical to page X, which is the 'default' page that we want to rank in Google (a product page that has a color selector) but is not directly linked from the site Mention page X in the sitemap.xml. (And not A, B, C or D). So the 'clean' urls get indexed and the color variations do not? In other words: Is it possible to rank a page that is only discovered via sitemap and canonicals?
Intermediate & Advanced SEO | | Adriaan.Multiply0 -
Old URLs that have 301s to 404s not being de-indexed.
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404. Here's what I mean. Case 1 - Good page: http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200 Case 2 - Bad page that no longer exists: http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404 Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version. Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing. Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
Intermediate & Advanced SEO | | boxclever0 -
Links to my site still showing in Webmaster Tools from a non-existent site
We owned 2 sites, with the pages on Site A all linking over to similar pages on Site B. We wanted to remove the links from Site A to Site B, so we redirected all the links on Site A to the homepage on Site A, and took Site A down completely. Unfortunately we are still seeing the links from Site A coming through on Google Webmaster Tools for Site B. Does anybody know what else we can do to remove these links?
Intermediate & Advanced SEO | | pedstores0 -
Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.
An example URL can be found here: http://symptom.healthline.com/symptomsearch?addterm=Neck%20pain&addterm=Face&addterm=Fatigue&addterm=Shortness%20Of%20Breath A couple of questions: Why is Google reporting an issue with these URLs if they are marked as noindex? What is the best way to fix the issue? Thanks in advance.
Intermediate & Advanced SEO | | nicole.healthline0 -
Malicious site pointed A-Record to my IP, Google Indexed
Hello All, I launched my site on May 1 and as it turns out, another domain was pointing it's A-Record to my IP. This site is coming up as malicious, but worst of all, it's ranking on keywords for my business objectives with my content and metadata, therefore I'm losing traffic. I've had the domain host remove the incorrect A-Record and I've submitted numerous malware reports to Google, and attempted to request removal of this site from the index. I've resubmitted my sitemap, but it seems as though this offending domain is still being indexed more thoroughly than my legitimate domain. Can anyone offer any advice? Anything would be greatly appreciated! Best regards, Doug
Intermediate & Advanced SEO | | FranGen0 -
Changing Hosting Companies - Site Downtime - Google Indexing Concern
We are getting ready to switch to a new hosting company. When we make the switchover, our sites will be offline for a couple of hours and in some cases perhaps as long as 12 hours while DNS is configured -- should we be worried about Google trying to index pages and finding them unavailable? Any fear of Google de-indexing pages. Our guess was that Google would not de-index anything after just a short period of not being able to find pages -- it would have to be over an extended period of time before GOOGLE or BING would de-index pages -- CORRECT? Just want to gut check this before pulling the trigger on switch over to new hosting company. We appreciate input on this and/or any other thoughts regarding the switch over to new hosting company that we may not have thought of. Thanks, Matt
Intermediate & Advanced SEO | | MWM37720