Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Problems with US site being prioritized in Google UK
Our US version (.com) of our site is appearing above the UK version (co.uk) when using Google UK. I know Google has been giving US more priority in the UK market over the last couple years... What is protocol for fixing/dealing with this? Also, and probably more importantly, how do we handle users who are looking for the UK site right now? Majority of our users are coming from the US so we don't want to cause them any inconvenience, but the UK users need an easy way to get to the UK version quickly. Input is much appreciated!
Intermediate & Advanced SEO | | chrisvogel0 -
Can Google penalize your site without sending you a Manual Spam Action?
I had a massive drop in traffic in Mid 2013, and a slow reduction since then. It has sort of leveled off now, but it's not exactly climbing I've never received a manual spam action. The answer to my question seems pretty obvious, now that I write it out... but have you heard of anyone getting penalized, without specifically receiving a warning? Thanks!
Intermediate & Advanced SEO | | DavidC.0 -
Blocking Certain Site Parameters from Google's Index - Please Help
Hello, So we recently used Google Webmaster Tools in an attempt to block certain parameters on our site from showing up in Google's index. One of our site parameters is essentially for user location and accounts for over 500,000 URLs. This parameter does not change page content in any way, and there is no need for Google to index it. We edited the parameter in GWT to tell Google that it does not change site content and to not index it. However, after two weeks, all of these URLs are still definitely getting indexed. Why? Maybe there's something we're missing here. Perhaps there is another way to do this more effectively. Has anyone else ran into this problem? The path we used to implement this action:
Intermediate & Advanced SEO | | Jbake
Google Webmaster Tools > Crawl > URL Parameters Thank you in advance for your help!0 -
Breadcrumbs for E Commerce Site
Hi, Does anyone have experience with Breadcrumb nodes for e-commerce? http://www.google.com/webmasters/tools/richsnippets?q=http%3A%2F%2Fwww.overstock.com%2FOffice-Supplies%2FOffice-Star-Professional-Air-Grid-Deluxe-Task-Chair%2F2605023%2Fproduct.html What happens if your product appears in more than one category? Should you let google spider the various breadcrumb routes to the category?? Which one would take preference in results? Right now, for ease of management, we have not enabled category URL paths to the product - so the product appears right after the domain, for example, www.mydomain.com/en/myproduct.html - If we do enable category URL paths, Any comments or opinions? Thanks!
Intermediate & Advanced SEO | | bjs20100 -
E commerce canonical links: include category structure?
I have a client on shopify. All categories have correct canonical links. however, the links from all menus, category pages, etc. follow this structure: /collections/COLLECTION_NAME/products/PRODUCT_NAME but the canonical link on the above product url is: /products/PRODUCT_NAME I have a feeling this is hurting our product detail page's seo. Our collection pages are ranking fine, but for some reason the detail pages aren't. It could be that they are deeper, but I am trying to make sure nothing big is causing it first before I get into the smaller factors. Any best practices on this?
Intermediate & Advanced SEO | | no6thgear0 -
How to remove an entire site from Google?
Hi people, I have a site with around 2.000 urls indexed in google, and 10 subdomains indexed too, which I want to remove entirely, to set up a new web. Which is the best way to do it? Regards!
Intermediate & Advanced SEO | | SeoExpertos0 -
Why is Google indexing either the singular or plural version of a keyword?
Hello Forum, We have just finished completely redoing a website and it seems that for several keywords either the plural or singular version is no longer being displayed in Google search results. For example, we sell yoga products, one of which is a bolster. In the SEO section of Google Analytics, the keyword "bolsters" has held a steady rank while "bolster" lost lots of rank and now no longer shows. Both keywords pointed to the same page and hold nearly equal rank, which has both keywords for "bolster" and "yoga bolster" Any idea what may be going on?
Intermediate & Advanced SEO | | pano0 -
My Job Site is having Indexing Issues
I have 2 job sites that I am managing and working on. One of the sites has a great deal of job vacancies and expired job pages that have been indexed. This one below: http:// job search.cctc .com/cctc Jobsearch/expandedjobsearch.do This job site does not have any job pages index: http://www.cross countryallied. com/ctAlliedWebSite/ travel-nurse-jobs/job-search.jsp Why and what can I do to get the dynamic pages index and ranking? Any help tips would be much appreciated. Thanks
Intermediate & Advanced SEO | | Melia0