Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Google Only Indexing Canonical Root URL Instead of Specified URL Parameters
We just launched a website about 1 month ago and noticed that Google was indexing, but not displaying, URLs with "?location=" parameters such as: http://www.castlemap.com/local-house-values/?location=great-falls-virginia and http://www.castlemap.com/local-house-values/?location=mclean-virginia. Instead, Google has only been displaying our root URL http://www.castlemap.com/local-house-values/ in its search results -- which we don't want as the URLs with specific locations are more important and each has its own unique list of houses for sale. We have Yoast setup with all of these ?location values added in our sitemap that has successfully been submitted to Google's Sitemaps: http://www.castlemap.com/buy-location-sitemap.xml I also tried going into the old Google Search Console and setting the "location" URL Parameter to Crawl Every URL with the Specifies Effect enabled... and I even see the two URLs I mentioned above in Google's list of Parameter Samples... but the pages are still not being added to Google. Even after Requesting Indexing again after making all of these changes a few days ago, these URLs are still displaying as Allowing Indexing, but Not On Google in the Search Console and not showing up on Google when I manually search for the entire URL. Why are these pages not showing up on Google and how can we get them to display? Only solution I can think of would be to set our main /local-house-values/ page to noindex in order to have Google favor all of our other URL parameter versions... but I'm guessing that's probably not a good solution for multiple reasons.
Intermediate & Advanced SEO | | Nitruc0 -
Google Is Indexing my 301 Redirects to Other sites
Long story but now i have a few links from my site 301 redirecting to youtube videos or eCommerce stores. They carry a considerable amount of traffic that i benefit from so i can't take them down, and that traffic is people from other websites, so basically i have backlinks from places that i don't own, to my redirect urls (Ex. http://example.com/redirect) My problem is that google is indexing them and doesn't let them go, i have tried blocking that url from robots.txt but google is still indexing it uncrawled, i have also tried allowing google to crawl it and adding noindex from robots.txt, i have tried removing it from GWT but it pops back again after a few days. Any ideas? Thanks!
Intermediate & Advanced SEO | | cuarto7150 -
Schema for E-Commerce websites
Hi Guys. I am running a cleanup for the on page schema we use and will be moving the on page elements into tag manager. I have all the metas and schema for the products boxed off. My question today is what schema should I use for category pages. Granted there is Json-LD for aggregated reviews but I cant see or work out how or what to use for the category pages that have the lists of products on. Any assistance appreciated. Alex
Intermediate & Advanced SEO | | JBGlobalSEO1 -
E-Commerce Mobile Pagination Dillema
Hi Everybody, I'm managing the SEO for an E-commerce site with different desktop and mobile sites (meaning, not responsive). We're changing the way reviews on mobile product pages will be displayed from 'view all' to pagination (due to server load). Basically the above the fold part of the page will always display the product, and below the fold will have x numbers of reviews on each page. But here is where it gets tricky: 1 - A different number of review pages will exist on mobile vs desktop (due to different no. of reviews per page on each device) - so I'm wondering what's the solution regarding canonicals. Usually every mobile page points to its desktop parallel, but now we'll have non-matching pages. 2 - The users will be able to change the no. of reviews displayed on each page. So the number of paginated pages will change accordingly. I was thinking about a solution where all the reviews will be in the first page's html (and only X of them will be displayed on screen), and all the other paginated pages will be created dynamically (with # and won't be indexed, so basically no pagination in mobile). Does anyone think this can be seen as cloaking or has any other thoughts? Thanks, Sarah
Intermediate & Advanced SEO | | Don340 -
Recommended e-commerce site search for Magento?
Does anyone have recommendations for any particular site searches for large e-commerce sites based on Magento? Some (hopeful) requirements: Possibility to segment product pages and blog content on results page Doesn't cause any major SEO or technical issues Understands semantic search Ability to filter results Ability to sort (e.g. by price, popularity, new in stock) It'd be really useful to see examples and know if there are any particular issues to be aware of. Thanks. 🙂
Intermediate & Advanced SEO | | Alex-Harford0 -
[E-commerce] Duplicate content due to color variations (canonical/indexing)
Hello, We currently have a lot of color variations on multiple products with almost the same content. Even with our canonicals being set, Moz's crawling tool seems to flag them as duplicate content. What we have done so far: Choosing the best-selling color variation (our "master product") Adding a rel="canonical" to every variation (with our "master product" as the canonical URL) In my opinion, it should be enough to address this issue. However, being given the fact that it's flagged as duplicate by Moz, I was wondering if there is something else we should do? Should we add a "noindex,follow" to our child products and "index,follow" to our master product? (sounds to me like such a heavy change) Thank you in advance
Intermediate & Advanced SEO | | EasyLounge0 -
Huge google index with un-relevant pages
Hi, i run a site about sport matches, every match has a page and the pages are generated automatically from the DB. pages are not duplicated, but over time some look a little bit similar. after a match finishes it has no internal links or sitemap entry, but it's reachable by direct URL and continues to be on google index. so over time we have more than 100,000 indexed pages. since past matches have no significance and they're not linked and a match can repeat and it may look like duplicate content....what you suggest us to do: when a match is finished - not linked, but appears on the index and SERP 301 redirect the match Page to the match Category which is a higher hierarchy and is always relevant? use rel=canonical to the match Category do nothing.... *301 redirect will shrink my index status, some say a high index status is good... *is it safe to 301 redirect 100,000 pages at once - wouldn't it look strange to google? *would canonical remove the past matches pages from the index? what do you think? Thanks, Assaf.
Intermediate & Advanced SEO | | stassaf0