Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No Index No follow instead of Rel canoncical on product pages
Hi all, we handle our product pages no with rel canonical now, we have 1 url that is indexed http://www.prams.net/cam-combi-family the other colours have different urls like http://www.prams.net/cam-combi-family-3-in-1-pram-reversible-seat-car-seat-grey-d which canonicalize to the indexed page. Google still crawls all those pages. For crawl budget reasons we want to use "no index, no follow" instead on these pages (the pages for the other colours)? Google would then crawl fewer pages more often? Does this make sense? Are their any downsides doing it? Thanks in advance Dieter
Intermediate & Advanced SEO | | Storesco1 -
What to do with large number of old/outdated pages?
Are we redoing a large portion of our site (not ecommerce). We have a large number of pages (about 2000 indexed pages, out of about 3000) that have been forgetten about until recently, are very outdated, don't drive any traffic (according to Google Analytics) But they are ranking very well for the targeting keyword (#3 organic for most). What should I do with those pages? Could you give any guidance on whether we should or what affect it might have one the rest of the website if we delete those pages or simply 301 redirecting all those pages to the home page?
Intermediate & Advanced SEO | | aphoontrakul0 -
Duplicate page url crawl report
Details: Hello. Looking at the duplicate page url report that comes out of Moz, is the best tactic to a) use 301 redirects, and b) should the url that's flagged for duplicate page content be pointed to the referring url? Not sure where the 301 redirect should be applied... should this url, for example: <colgroup><col width="452"></colgroup>
Intermediate & Advanced SEO | | compassseo
| http://newgreenair.com/website/blog/ | which is listed in the first column of the Duplicate Page Content crawl, be pointed to referring url in the same spreadsheet? Or, what's the best way to apply the 301 redirect? thanks!0 -
Duplicate/ <title>element too long issues</title>
I have a "duplicate <title>"/"<title> element too long" issue with thousands of pages. In the future I would like to automate these in a way that keeps them from being duplicated AND too long. The solution I came up with was to standardize these monthly posts with a similar, shorter, <title>, but then differentiate by adding the month and the year of the post at the end of each <title>. Hundreds of these come out every week, so it is hard to sit there and come up with a unique <title> every time. With this solution the <title> tags would undoubtedly be short enough, however my primary concern is, would simply adding the month and year at the end of each <title> be enough for Google/Moz to decide it is not a duplicate? How much variation is enough for it not to be deemed a duplicate <title>? </p></title>
Intermediate & Advanced SEO | | Brian_Dowd0 -
Glossary index and individual pages create duplicate content. How much might this hurt me?
I've got a glossary on my site with an index page for each letter of the alphabet that has a definition. So the M section lists every definition (the whole definition). But each definition also has its own individual page (and we link to those pages internally so the user doesn't have to hunt down the entire M page). So I definitely have duplicate content ... 112 instances (112 terms). Maybe it's not so bad because each definition is just a short paragraph(?) How much does this hurt my potential ranking for each definition? How much does it hurt my site overall? Am I better off making the individual pages no-index? or canonicalizing them?
Intermediate & Advanced SEO | | LeadSEOlogist0 -
Date of page first indexed or age of a page?
Hi does anyone know any ways, tools to find when a page was first indexed/cached by Google? I remember a while back, around 2009 i had a firefox plugin which could check this, and gave you a exact date. Maybe this has changed since. I don't remember the plugin. Or any recommendations on finding the age of a page (not domain) for a website? This is for competitor research not my own website. Cheers, Paul
Intermediate & Advanced SEO | | MBASydney0 -
Page Count in Webmaster Tools Index Status Versus Page Count in Webmaster Tools Sitemap
Greeting MOZ Community: I run www.nyc-officespace-leader.com, a real estate website in New York City. The page count in Google Webmaster Tools Index status for our site is 850. The page count in our Webmaster Tools Sitemap is 637. Why is there a discrepancy between the two? What does the Google Webmaster Tools Index represent? If we filed a removal request for pages we did not want indexed, will these pages still show in the Google Webmaster Tools page count despite the fact that they no longer display in search results? The number of pages displayed in our Google Webmaster Tools Index remains at about 850 despite the removal request. Before a site upgrade in June the number of URLs in the Google Webmaster Tools Index and Google Webmaster Site Map were almost the same. I am concerned that page bloat has something to do with a recent drop in ranking. Thanks everyone!! Alan
Intermediate & Advanced SEO | | Kingalan10 -
Can I Use Cross Domain Canonical For Duplicate Categories & Product Pages?
I want to fix issue regarding duplicate categories & product pages on my multiple eCommerce websites. http://www.vistastores.com/patio-umbrellas-fiberbuilt-umbrellas-llc-7gcrw-teal.html - Want to rank with this... http://www.vistapatioumbrellas.com/patio-umbrellas-fiberbuilt-umbrellas-llc-7gcrw-teal.html - Duplicate one! http://www.vistastores.com/patio-umbrellas - Want to rank with this... http://www.vistapatioumbrellas.com/patio-umbrellas - Duplicate one!
Intermediate & Advanced SEO | | CommercePundit0