Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site

SarahCollins

Hi,

Any assistance would be greatly appreciated.

Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site".

This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc.

Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years.

We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what .

what is the best solution .?

A couple of example of our problems are here :

In Google type -

site:bestathire.co.uk inurl:"br"

You will see 107 results. This is one of many lot we need to get rid of.

Also -

site:bestathire.co.uk intitle:"All items from this hire company"

Shows 25,300 indexed pages we need to get rid of

Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical.

As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ?

Any advice on both points greatly appreciated?

thanks

Sarah.

SarahCollins

Ahhh, I see what you mean now. Yes, good idea .

Will get that implement to.

Yes, everything is duplicated.It's all the same apart from the url which seems to be bringing in to different locations instead of one.

Odd url Generated(notice it has 2 locations in it)

http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250/Alfreton

Correct location specific urls -

http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250

http://www.bestathire.co.uk/rent/Vacuum_cleaners/Alfreton/250

thanks

Sarah.

RuthBurrReedy

Since (I assume this is what is happening) your ecommerce platform is duplicating the entire page, code and all, and putting it at these new URLs, having the canonical tag of the original page URL in the code for the right/real page will mean that, when it gets duplicated, the canonical tag will get duplicated as well and point back to the original URL. Make sense?

Can you talk to your ecommerce platform provider? This can't be an intended feature!

SarahCollins

Thanks Ruth for the very comprehensive answer. Greatly Appreciated !.

Just to clarify your suggestion about the Rel=Canonical tag. Put it on the preferred pages . When the duplicate odd urls get generated, they Wont have a canonical tag so google will know there are not the original page ?.. Is that correct.

Sorry I just got a bit confused as you said the duplicate pages will have a concanical tag as well ?

As for the existing pages, they are very recent so wouldn't assume they would have any pr to warrent a 301 as opposed to a 404 but guess either would be ok.

Also adding the Meta name no index tag as you suggested to sounds very wise so will get that done to.

We also can't find how these urls were created and then indexed so just hoping a debug file we just created may shed some light.

Will keep you posted....

Many thanks

Sarah

RuthBurrReedy

Oh how frustrating!

There are a couple of things that you can do. Updating your robots.txt is a good start since the next time your site is crawled, Google should find that and drop at least some of the offending pages from the index. I would also go in to every page of your site and add in a rel=canonical tag to the original version of the URL. That way, even if your ecommerce platform is generating odd versions of the URL, that canonical tag will be on the duplicate versions letting engines know they're not the original page.

For the existing pages, you could just 301 them all back to the original versions, or add the canonical tag pointing back to the original versions. I would also add the tag to these pages to let Google know not to include them in the index.

With pagination and canonicalization there are a few different approaches, and each has its pros and cons. Dr. Pete wrote a really great post on canonicalization that just went out, you can read it here: http://www.seomoz.org/blog/which-page-is-canonical. I also recommend reading Adam Audette's post on pagination options at Search Engine Land: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284. I hope that helps!

emediaSEO

As long as you think the sitemap is done right it should be fine.

SarahCollins

Yes we submitted mini site maps to webmaster originally a couple of months back as our site is 60K pages so we broke is down to categories it etc.

We have not submitted a new map since finding this problem.

We are in the process of using the sitemap generator to generator new site map to see if it picks up anything usual.

Are u suggesting to resubmit ?

thanks

Sarah

emediaSEO

In the short term I would definitely use canonicals to let Google know which are the right pages until you can fix your problem. Also, have you submitted a sitemap to Webmasters?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Why a certain URL ( a category URL ) disappears?

Should m-dot sites be indexed at all

Index an URL without directly linking it?

Old URLs that have 301s to 404s not being de-indexed.

Links to my site still showing in Webmaster Tools from a non-existent site

Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.

Malicious site pointed A-Record to my IP, Google Indexed

Changing Hosting Companies - Site Downtime - Google Indexing Concern