Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should m-dot sites be indexed at all
I have a client with a site with a m-dot mobile version. They will move it to a responsive site sometime next year but in meanwhile I have a massive doubt. This m-dot site has some 30k indexed pages in Google. Each of this page is bidirectionally linked to the www. version (rel="alternate on the www, rel canonical on the m-dot) There is no noindex on the m-dot site, so I understand that Google might decide to index the m-dot pages regardless of the canonical to the www site. But my doubts stays: is it a bad thing that both the version are indexed? Is this having a negative impact on the crawling budget? Or risking some other bad consequence? and how is the mobile-first going to impact on this? Thanks
Intermediate & Advanced SEO | | newbiebird0 -
Cache and index page of Mobile site
Hi, I want to check cache and index page of mobile site. I am checking it on mobile phone but it is showing the cache version of desktop. So anybody can tell me the way(tool, online tool etc.) to check mobile site index and cache page.
Intermediate & Advanced SEO | | vivekrathore0 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
Site not indexed in Google UK
This site was moved to a new host by the client a month back and is still not indexed in Google UK if you search for the site directly. www.loftconversionswestsussex.com Webmaster tools shows that 55 pages have been crawled and no errors have been detected. The client also tried the "Fetch as Google Bot" tactic in GWT as well as running a PPC campaign and the site is still not appearing in Google. Any thoughts please? Cheers, SEO5..
Intermediate & Advanced SEO | | SEO5Team0 -
Can Google index PDFs with flash?
Does anyone know if Google can index PDF with Flash embedded? I would assume that the regular flash recommendations are still valid, even when embedded in another document. I would assume there is a list of the filetype and version which Google can index with the search appliance, but was not able to find any. Does anyone have a link or a list?
Intermediate & Advanced SEO | | andreas.wpv0 -
Indexation of content from internal pages (registration) by Google
Hello, we are having quite a big amount of content on internal pages which can only be accessed as a registered member. What are the different options the get this content indexed by Google? In certain cases we might be able to show a preview to visitors. In other cases this is not possible for legal reasons. Somebody told me that there is an option to send the content of pages directly to google for indexation. Unfortunately he couldn't give me more details. I only know that this possible for URLs (sitemap). Is there really a possibility to do this for the entire content of a page without giving google access to crawl this page? Thanks Ben
Intermediate & Advanced SEO | | guitarslinger0 -
E commerce
Hi there. I'm currently optimizing ecommece websites for my company. The problem is, we have a duplication module whereby we duplicate sites accordingly to other countries. The on page analysis shows about 2,000 of duplicated content. How do i resolve this issue? I was planning to instruct the writers to write different content across different countries. Any suggestion on this? thanks
Intermediate & Advanced SEO | | k3zuya0 -
Google Not Indexing Description or correct title (very technical)
Hey guys, I am managing the site: http://www.theattractionforums.com/ If you search the keyword "PUA Forums", it will be in the top 10 results, however the title of the forum will be "PUA Forums" rather than using the code in the title tag, and no description will display at all (despite there being one in the code). Any page other than the home-page that ranks shows the correct title and description. We're completely baffled! Here are some interesting bits and pieces: It shows up fine on Bing If I go into GWT and Fetch as Google Bot, it shows up as "Unreachable" when I try to pull the home-page. We previously found that it was pulling 'index.htm' before 'index.php' - and this was pulling a blank page. I've fixed this in the .htaccess however to make it redirect, however this hasn't solved the problem. I've disallowed it from pulling the description .etc from the Open Directory with the use of meta tags - didn't change anything. It's vBulletin and is running vBSEO Any suggestions at all guys? I'll be forever in anyones debt who can solve this, it's proving to be near impossible to fix. Here is the .htaccess file, it may be a part of the issue: RewriteEngine On DirectoryIndex index.php index.html Redirect /index.html http://www.theattractionforums.com/index.php RewriteCond %{HTTP_HOST} !^www.theattractionforums.com
Intermediate & Advanced SEO | | trx
RewriteRule (.*) http://www.theattractionforums.com/$1 [L,R=301] RewriteRule ^((urllist|sitemap_).*.(xml|txt)(.gz)?)$ vbseo_sitemap/vbseo_getsitemap.php?sitemap=$1 [L] RewriteCond %{REQUEST_URI} !(admincp/|modcp/|cron|vbseo_sitemap/)
RewriteRule ^((archive/)?(..php(/.)?)?)$ vbseo.php [L,QSA] RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !^(admincp|modcp|clientscript|cpstyles|images)/
RewriteRule ^(.+)$ vbseo.php [L,QSA]
RewriteRule ^forum/(.*)$ http://www.theattractionforums.com/$1 [R=301,L]0