Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO impact of mouse over text on product page
Hi, we recently improved our product page to show more color options, like this http://www.prams.net/knorr-baby-voletto-sport-pram-stroller-reversible-seat-green-a?inref=home-left In order to improve the seo, we expanded our rich snippets the following way we added all color options, skus and prices as "items offered" we are showing the highest and lowest price range and eliminated the base price https://developers.google.com/structured-data/testing-tool/ Google now shows the price range in the rich snippet. The questions is: as the user see the original color, the price and the sku only when mousing over the small images. We are worried that this could be treated a "hidden text". Does anybody have experience in this matter or a way a to solve it better? Thanks in advance Dieter 8WthtQY
Intermediate & Advanced SEO | | Storesco0 -
Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
A company has a TLD (top-level-domain) which every single product: company.com/product/name.html The company also has subdomains (tailored to a range of products) which lists a choosen selection of the products from the TLD - sort of like a feed: subdomain.company.com/product/name.html The content on the TLD & subdomain product page are exactly the same and cannot be changed - CSS and HTML is slightly differant but the content (text and images) is exactly the same! My concern (and rightly so) is that Google will deem this to be duplicate content, therfore I'm going to have to add a rel cannonical tag into the header of all subdomain pages, pointing to the original product page on the TLD. Does this sound like the correct thing to do? Or is there a better solution? Moving on, not only are products fed onto subdomain, there are a handfull of other domains which list the products - again, the content (text and images) is exactly the same: other.com/product/name.html Would I be best placed to add a rel cannonical tag into the header of the product pages on other domains, pointing to the original product page on the actual TLD? Does rel cannonical work across domains? Would the product pages with a rel cannonical tag in the header still rank? Let me know if there is a better solution all-round!
Intermediate & Advanced SEO | | iam-sold0 -
How to rank product pages?
Hi guys, Please advice me on something improving my product pages ranking. We are doing well for head terms, categories but not ranking for product pages. We have issues with product pages which I am think is hard to tackle. For instance we have duplicate products (different colors), duplicate content internally (colors) and from manufacturer websites. Product pages linked from sub-category i.e. Home > Category > Sub-Category (20 per page) using pagination for next 20 and so on. Product pages linked internally via widgets that says other Similar products, featured products etc. Another issue with our product pages is that we are using third party reviews platform and whenever users add reviews to product pages this platform creates an hyperlink to different anchors which is not relevant to product. Example - http://goo.gl/NUG652 Can somebody please give some advice on how to improve rankings for product pages. writing unique content for thousands of pages is not possible. Even our competitor not writing unique content.
Intermediate & Advanced SEO | | Webmaster_SEO0 -
Product Tag Pages - Shopify
My website is Sportiqe.com. We sell t-shirts and use Shopify. We're finding that Google is assigning a higher than normal (normal being "1") page authority ranking on our product tag pages (ie - Products Tagged "knicks"). Would it make sense to do 301 redirects for these product tag pages to the Product pages we want to rank for? (ie - would we do a 301 redirect for a page called "Products Tagged 'Knicks'" to our "New York Knicks Shirts" page?) OR Would it make sense to change these Product Tag Page titles to another key term to have multiple search results (assuming that ordering the products in a different way would eliminate any Duplicate Page Content issues?) For example, renaming the page title from "Products Tagged Knicks" to "TAG NAME | Sportiqe Apparel" Appreciate any insight from the Moz community, Shopify store managers and fellow t-shirt enthusiasts.
Intermediate & Advanced SEO | | farmiloe0 -
I have removed over 2000+ pages but Google still says i have 3000+ pages indexed
Good Afternoon, I run a office equipment website called top4office.co.uk. My predecessor decided that he would make an exact copy of the content on our existing site top4office.com and place it on the top4office.co.uk domain which included over 2k of thin pages. Since coming in i have hired a copywriter who has rewritten all the important content and I have removed over 2k pages of thin pages. I have set up 301's and blocked the thin pages using robots.txt and then used Google's removal tool to remove the pages from the index which was successfully done. But, although they were removed and can now longer be found in Google, when i use site:top4office.co.uk i still have over 3k of indexed pages (Originally i had 3700). Does anyone have any ideas why this is happening and more importantly how i can fix it? Our ranking on this site is woeful in comparison to what it was in 2011. I have a deadline and was wondering how quickly, in your opinion, do you think all these changes will impact my SERPs rankings? Look forward to your responses!
Intermediate & Advanced SEO | | apogeecorp0 -
Duplicate Content From Indexing of non- File Extension Page
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
Intermediate & Advanced SEO | | WebbyNabler0 -
Http://blogsearch.google.com/ping
Is there any reason why a website would submit all their content (videos, photo galleries, articles) to this?
Intermediate & Advanced SEO | | MargaritaS0 -
What are the different tactics for getting ranked/ included in Google finance searches such as http://www.google.com/finance/company_news?q=NASDAQ:ADBE
I don't know what ranking factors they are using for this feed. The results vary greatly from a search done at google.com or google.com/news and google.com/finance I'm working with a website that regularly publishes finance-related news and currently gets traffic from google finance. I'm wondering what we can do to optimize our news articles to possibly show more prominently or more often. Thanks
Intermediate & Advanced SEO | | joemascaro0