Huge Google index on E-commerce site

ssiebn7

Hi Guys,

I got a question which i can't understand.

I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).

The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.

Another strange thing which another forum member describes here :

Cache date has been reverted

And next to that old url's (which have a 301 for about a month now) keep showing up in the index.

Does anyone know what i could do to solve the problem?

ssiebn7

Allright guys, thanks alot for the answers.

Gonna try some things out coming monday.

Canonical url's and pagination (rel=prev) will work i guess.

The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.

So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!

@ Andy , I checked it and it actually says :

Total indexed : 98.000
Ever crawled: 929.762

And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.

Thanks again for your answers

SEOAndy

something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.

AJPro

We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.

By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

LynnPatchett

Hi,

A couple of things could be and probably are at work in this situation.

1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.

2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).

3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.

Hope that helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Huge Google index on E-commerce site

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Can Google Crawl & Index my Schema in CSR JavaScript

SEO on Jobs sites: how to deal with expired listings with "Google for Jobs" around

Google indexing only 1 page out of 2 similar pages made for different cities

Pages are Indexed but not Cached by Google. Why?

Wrong country sites being shown in google

Google Indexing Feedburner Links???

How would I know if Google is showing me as two separate sites?

Panda Recovery - What is the best way to shrink your index and make Google aware?