Crawling/indexing of near duplicate product pages

AMAGARD

Hi,

Hope someone can help me out here. This is the current situation:

We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.

We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
There is not any search volume related to the different quantities
The 'top' page does not link to the pages for the different quantities
The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.

Current situation:
- Most pages for the different quantities do not have internal links (about 95%)

But the sitemap does contain all of these pages.
Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.

Problems:

Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
Having url's in the sitemap that do not have an internal link is a problem on its own
All these pages are indexed so all sorts of gravel/pebbles have near duplicates.

My solution:

remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index

My questions:

To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
Do you agree that these pages are near duplicates and that it is best to remove them from the index?
A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?

Thanks a lot in advance for your help!

Best!

AMAGARD

Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.

We will implement the solution I described and see what will happen. Thanks again!

Seenlyst

Hello there,

To answer your questions,

1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt

2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?

3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.

Hope this helps,
Joseph Yap

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawling/indexing of near duplicate product pages

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

To remove or not remove a redirected page from index

Page with metatag noindex is STILL being indexed?!

Disallow: /jobs/? is this stopping the SERPs from indexing job posts

Town and County pages taking months to index.

Mystery: Ranking in Amazon for a product page?

Could you use a robots.txt file to disalow a duplicate content page from being crawled?

Duplicate internal links on page, any benefit to nofollow

Odd duplicate page notification (I think)...