Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google webcache of product page redirects back to product page
Hi all– I've legitimately never seen this before, in any circumstance. I just went to check the google webcache of a product page on our site (was just grabbing the last indexation date) and was immediately redirected away from google's cached version BACK to the site's standard product page. I ran a status check on the product page itself and it was 200, then ran a status check on the webcache version and sure enough, it registered as redirected. It looks like this is happening for ALL indexed product pages across the site (several thousand), and though organic traffic has not been affected it is starting to worry me a little bit. Has anyone ever encountered this situation before? Why would a google webcache possibly have any reason to redirect? Is there anything to be done on our side? Thanks as always for the help and opinions, y'all!
Intermediate & Advanced SEO | | TukTown1 -
Duplicate Pages #!
Hi guys, Currently have duplicate pages accross a website e.g. https://archierose.com.au/shop/cart**#!** https://archierose.com.au/shop/cart The only difference is the URL 1 has a hashtag and exclamation tag. Everything else is the same. We were thinking of adding rel canonical tags on the #! versions of the page to the correct URLs. But Google doens't seem to be indexing the #! versions anyway. Does anyone know why this is the case? If Google is not indexing them, is there any point adding rel canonical tags? Cheers, Chris https://archierose.com.au/shop/cart#!
Intermediate & Advanced SEO | | jayoliverwright0 -
Magento products and eBay - duplicate content risk?
Hi, We are selling about 1000 sticker products in our online store and would like to expand a large part of our products lineup to eBay as well. There are pretty good modules for this as I've heard. I'm just wondering if there will be duplicate content problems if I sync the products between Magento and eBay and they get uploaded to eBay with identical titles, descriptions and images? What's the workaround in this case? Thanks!
Intermediate & Advanced SEO | | speedbird12290 -
Why are some pages indexed but not cached by Google?
The question is simple but I don't understand the answer. I found a webpage that was linking to my personal site. The page was indexed in Google. However, there was no cache option and I received a 404 from Google when I tried using cache:www.thewebpage.com/link/. What exactly does this mean? Also, does it have any negative implication on the SEO value of the link that points to my personal website?
Intermediate & Advanced SEO | | mRELEVANCE0 -
Ecommerce SEO - Indexed product pages are returning 404's due to product database removal. HELP!
Hi all, I recently took over an e-commerce start-up project from one of my co-workers (who left the job last week). This previous project manager had uploaded ~2000 products without setting up a robot.txt file, and as a result, all of the product pages were indexed by Google (verified via Google Webmaster Tool). The problem came about when he deleted the entire product database from our hosting service, godaddy and performed a fresh install of Prestashop on our hosting plan. All of the created product pages are now gone, and I'm left with ~2000 broken URL's returning 404's. Currently, the site does not have any products uploaded. From my knowledge, I have to either: canonicalize the broken URL's to the new corresponding product pages, or request Google to remove the broken URL's (I believe this is only a temporary solution, for Google honors URL removal request for 90 days) What is the best way to approach this situation? If I setup a canonicalization, would I have to recreate the deleted pages (to match the URL address) and have those pages redirect to the new product pages (canonicalization)? Alex
Intermediate & Advanced SEO | | byoung860 -
How can I see all the pages google has indexed for my site?
Hi mozers, In WMT google says total indexed pages = 5080. If I do a site:domain.com commard it says 6080 results. But I've only got 2000 pages in my site that should be indexed. So I would like to see all the pages they have indexed so I can consider noindexing them or 404ing them. Many thanks, Julian.
Intermediate & Advanced SEO | | julianhearn0 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Large number of pages crawled.
My campaign for printlabelandmail.com says that seomoz has crawled 619 pages. My site, however, only has a little over 250 pages. Where are these extra pages? I did recently relaunched my website with wordpress. I was using Dreamweaver before. I thought I deleted all the old pages. Could these extra pages be old pages from the site prior to my relaunch? I hope my question makes sense. Any insights would be helpful. Thanks! Andrea
Intermediate & Advanced SEO | | JimDirectMailCoach0