Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Proper 301 in Place but Old Site Still Indexed In Google
So i have stumbled across an interesting issue with a new SEO client. They just recently launched a new website and implemented a proper 301 redirect strategy at the page level for the new website domain. What is interesting is that the new website is now indexed in Google BUT the old website domain is also still indexed in Google? I even checked the Google Cached date and it shows the new website with a cache date of today. The redirect strategy has been in place for about 30 days. Any thoughts or suggestions on how to get the old domain un-indexed in Google and get all authority passed to the new website?
Intermediate & Advanced SEO | | kchandler0 -
Is my site penalized by Google?
Let's say my website is aaaaa.com and company name is aaaaa Systems. When I search Google aaaaa my site do not come up at all. When I search for "aaaaa Systems" it comes up. But in WMT I see quite a few clicks from aaaaa as keyword. Most of the traffic is brand keywords only. I never received any manual penalty in WMT ever. Is the site penalized or regular algorithm issues?
Intermediate & Advanced SEO | | ajiabs0 -
Is it better to not allow Google to index my Tumblr Blog?
Currently using a subdomain for my blog via Tumblr In my seo reports I see alot of errors. Mostly from the Tumblr blog. Made change so there are unique titles and tags. Too many errors I am wondering if it is best to just not allow it to be indexed via tumblr control panel. It certainly is doing a great job with engagement and social network follows, but i'm starting to wonder if and how much it is penalizing my domain.. Appreciate your input.. By the way this theme is not flash for the content very basic single a theme...
Intermediate & Advanced SEO | | wickerparadise0 -
Website is not indexed in Google, please help with suggestions
Our client website was removed from Google index. Anybody could recommend how to speed up process of re index: Webmaster tools done SM done (Twitter, FB) sitemap.xml done backlinks in process PPC done Robots.txt is fine Guys any recommendations are welcome, client is very unhappy. Thank you
Intermediate & Advanced SEO | | ThinkBDW0 -
Recently created site indexed; no backlinks showing?
I launched a website for a client in mid-March. The site is already indexed, I have built quite a few links to it (links are also indexed), and ranks well for some targeted keywords. However, when I try to check backlinks to the site with Open Site Explorer, it comes back with "No Data Available For This URL". Is this something I should be worried about or merely a case of 'recency' of page creation'? I know it says that it can take 45-60 days for a site to be included in Linkscape but I'm approaching the 60 days mark and still nothing.
Intermediate & Advanced SEO | | Igor-Avidon0 -
Help! Why did Google remove my images from their index?
I've been scratching my head over this one for a while now and I can't seem to figure it out. I own a website that is user-generated content. Users submit images to my sites of graphic resources (for designers) that they have created to share with our community. I've been noticing over the past few months that I'm getting completely dominated in Google Images. I used to get a ton of traffic from Google Images, but now I can't find my images anywhere. After diving into Analytics I found this: http://cl.ly/140L2d14040Q1R0W161e and realized sometime about a year ago my image traffic took a dive. We've gone back through all the change logs and can't find where we made any changes to the site structure that could have caused this. We are stumped. Does anyone know of any historical Google updates that could have caused this last year around the end of April 2010? Any help or insight would be greatly appreciated!
Intermediate & Advanced SEO | | shawn810 -
Do you bother cleaning duplicate content from Googles Index?
Hi, I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do... For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'. Do you think this is right? Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled? Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time? Thanks FashionLux
Intermediate & Advanced SEO | | FashionLux0 -
E commerce
Hi there. I'm currently optimizing ecommece websites for my company. The problem is, we have a duplication module whereby we duplicate sites accordingly to other countries. The on page analysis shows about 2,000 of duplicated content. How do i resolve this issue? I was planning to instruct the writers to write different content across different countries. Any suggestion on this? thanks
Intermediate & Advanced SEO | | k3zuya0