Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Subdirectory site / 301 Redirects / Google Search Console
Hi There, I'm a web developer working on an existing WordPress site (Site #1) that has 900 blog posts accessible from this URL structure: www.site-1.com/title-of-the-post We've built a new website for their content (Site #2) and programmatically moved all blog posts to the second website. Here is the URL structure: www.site-1.com/site-2/title-of-the-post Site #1 will remain as a normal company site without a blog, and Site #2 will act as an online content membership platform. The original 900 posts have great link juice that we, of course, would like to maintain. We've already set up 301 redirects that take care of this process. (ie. the original post gets redirected to the same URL slug with '/site-2/' added. My questions: Do you have a recommendation about how to best handle this second website in Google Search Console? Do we submit this second website as an additional property in GSC? (which shares the same top-level-domain as the original) Currently, the sitemap.xml submitted to Google Search Console has all 900 blog posts with the old URLs. Is there any benefit / drawback to submitting another sitemap.xml from the new website which has all the same blog posts at the new URL. Your guidance is greatly appreciated. Thank you.
Intermediate & Advanced SEO | | HimalayanInstitute0 -
E-Commerce Site Collection Pages Not Being Indexed
Hello Everyone, So this is not really my strong suit but I’m going to do my best to explain the full scope of the issue and really hope someone has any insight. We have an e-commerce client (can't really share the domain) that uses Shopify; they have a large number of products categorized by Collections. The issue is when we do a site:search of our Collection Pages (site:Domain.com/Collections/) they don’t seem to be indexed. Also, not sure if it’s relevant but we also recently did an over-hall of our design. Because we haven’t been able to identify the issue here’s everything we know/have done so far: Moz Crawl Check and the Collection Pages came up. Checked Organic Landing Page Analytics (source/medium: Google) and the pages are getting traffic. Submitted the pages to Google Search Console. The URLs are listed on the sitemap.xml but when we tried to submit the Collections sitemap.xml to Google Search Console 99 were submitted but nothing came back as being indexed (like our other pages and products). We tested the URL in GSC’s robots.txt tester and it came up as being “allowed” but just in case below is the language used in our robots:
Intermediate & Advanced SEO | | Ben-R
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /9545580/checkouts
Disallow: /carts
Disallow: /account
Disallow: /collections/+
Disallow: /collections/%2B
Disallow: /collections/%2b
Disallow: /blogs/+
Disallow: /blogs/%2B
Disallow: /blogs/%2b
Disallow: /design_theme_id
Disallow: /preview_theme_id
Disallow: /preview_script_id
Disallow: /apple-app-site-association
Sitemap: https://domain.com/sitemap.xml A Google Cache:Search currently shows a collections/all page we have up that lists all of our products. Please let us know if there’s any other details we could provide that might help. Any insight or suggestions would be very much appreciated. Looking forward to hearing all of your thoughts! Thank you in advance. Best,0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Google Indexing of Images
Our site is experiencing an issue with indexation of images. The site is real estate oriented. It has 238 listings with about 1190 images. The site submits two version (different sizes) of each image to Google, so there are about 2,400 images. Only several hundred are indexed. Can adding Microdata improve the indexation of the images? Our site map is submitting images that are on no-index listing pages to Google. As a result more than 2000 images have been submitted but only a few hundred have been indexed. How should the site map deal with images that reside on no-index pages? Do images that are part of pages that are set up as "no-index" need a special "no-index" label or special treatment? My concern is that so many images that not indexed could be a red flag showing poor quality content to Google. Is it worth investing in correcting this issue, or will correcting it result in little to no improvement in SEO? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Can Google read content/see links on subscription sites?
If an article is published on The Times (for example), can Google by-pass the subscription sign-in to read the content and index the links in the article? Example: http://www.thetimes.co.uk/tto/life/property/overseas/article4245346.ece In the above article there is a link to the resort's website but you can't see this unless you subscribe. I checked the source code of the page with the subscription prompt present and the link isn't there. Is there a way that these sites deal with search engines differently to other user agents to allow the content to be crawled and indexed?
Intermediate & Advanced SEO | | CustardOnlineMarketing0 -
Huge e-commerce site migration - what to do with product pages?
My very large e-commerce client is about to undergo a site migration in which every product page URL will be changing. I am already planning my 301 redirect process for the top ~1,000 pages on the site (categories, products, and more) but this will not account for the more than 1,000 products on the site. The client specified that they don't want to implement much more than 1,000 redirects so as to avoid impacting site performance. What is the best way to handle these pages without causing hundreds of 404 errors on site migration day? Thanks!
Intermediate & Advanced SEO | | FPD_NYC0 -
Google webmaster tools showing "no data available" for links to site, why?
In my google webmaster account I'm seeing all the data in other categories except links to my site. When I click links to my site I get a "no data available" message. Does anyone know why this is happening? And if so, what to do to fix it? Thanks.
Intermediate & Advanced SEO | | Nicktaylor10 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0