Duplicate content that looks unique
-
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on?
http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011
http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
-
Good question, because those pages look different to a human. The SEOmoz web app uses a similarity threshold of 95% of the html code. This takes everything on the page, both hidden and visible into account.
In this case, it's counting all of the navigation and sidebar as well, which is significant. What's left of the unique content - the part that matters, makes up less than 5% of the code.
Here's a tool you can use to check the similarity: http://www.duplicatecontent.net/
I ran the pages through a couple of tools which showed 96% HTML similarity.
(but only a 92% text similarity - which is good, but not great)
For perspective, take a look at Google's cached versions of one of these pages. This is how googlebot sees the page: http://webcache.googleusercontent.com/search?q=cache:4fKrbNTUnegJ:www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone+http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone&hl=en&gl=us&strip=1G
Since Panda, when I see a site with this many navigation links, I usually advise them to restructure their site architecture into more of a Pyramid shape, so that you reduce the overall navigation on each page.
There are 2 ways to look at this: First of all, Google is much more sophisticated than SEOmoz at detecting duplicate content, and they are also better at contextual analysis - so they can probably tell these are not true duplicates.
Hope this helps! Best of luck with your SEO.
-
SEOmoz looks at the code on the page when it looks at duplicate content scores. My hunch is that there's a lot of identical code on those pages, which is causing the warning.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best method for blocking a subdomain with duplicated content
Hello Moz Community Hoping somebody can assist. We have a subdomain, used by our CMS, which is being indexed by Google.
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/
https://admin.naturalworldsafaris.com/ The page is the same so we can't add a no-index or no-follow.
I have both set up as separate properties in webmaster tools I understand the best method would be to update the robots.txt with a user disallow for the subdomain - but the robots text is only accessible on the main domain. http://www.naturalworldsafaris.com/robots.txt Will this work if we add the subdomain exclusion to this file? It means it won't be accessible on https://admin.naturalworldsafaris.com/robots.txt (where we can't create a file). Therefore won't be seen within that specific webmaster tools property. I've also asked the developer to add a password protection to the subdomain but this does not look possible. What approach would you recommend?0 -
Magento products and eBay - duplicate content risk?
Hi, We are selling about 1000 sticker products in our online store and would like to expand a large part of our products lineup to eBay as well. There are pretty good modules for this as I've heard. I'm just wondering if there will be duplicate content problems if I sync the products between Magento and eBay and they get uploaded to eBay with identical titles, descriptions and images? What's the workaround in this case? Thanks!
Intermediate & Advanced SEO | | speedbird12290 -
[E-commerce] Duplicate content due to color variations (canonical/indexing)
Hello, We currently have a lot of color variations on multiple products with almost the same content. Even with our canonicals being set, Moz's crawling tool seems to flag them as duplicate content. What we have done so far: Choosing the best-selling color variation (our "master product") Adding a rel="canonical" to every variation (with our "master product" as the canonical URL) In my opinion, it should be enough to address this issue. However, being given the fact that it's flagged as duplicate by Moz, I was wondering if there is something else we should do? Should we add a "noindex,follow" to our child products and "index,follow" to our master product? (sounds to me like such a heavy change) Thank you in advance
Intermediate & Advanced SEO | | EasyLounge0 -
Duplicate page content errors stemming from CMS
Hello! We've recently relaunched (and completely restructured) our website. All looks well except for some duplicate content issues. Our internal CMS (custom) adds a /content/ to each page. Our development team has also set-up URLs to work without /content/. Is there a way I can tell Google that these are the same pages. I looked into the parameters tool, but that seemed more in-line with ecommerce and the like. Am I missing anything else?
Intermediate & Advanced SEO | | taylor.craig0 -
Duplicate content on sites from different countries
Hi, we have a client who currently has a lot of duplicate content with their UK and US website. Both websites are geographically targeted (via google webmaster tools) to their specific location and have the appropriate local domain extension. Is having duplicate content a major issue, since they are in two different countries and geographic regions of the world? Any statement from Google about this? Regards, Bill
Intermediate & Advanced SEO | | MBASydney0 -
PDF on financial site that duplicates ~50% of site content
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site. Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect. Thanks --
Intermediate & Advanced SEO | | 540SEO0 -
Duplicate Content Question
My understanding of duplicate content is that if two pages are identical, Google selects one for it's results... I have a client that is literally sharing content real-time with a partner...the page content is identical for both sites, and if you update one page, teh otehr is updated automatically. Obviously this is a clear cut case for canonical link tags, but I'm cuious about something: Both sites seem to show up in search results but for different keywords...I would think one domain would simply win out over the other, but Google seems to show both sites in results. Any idea why? Also, could this duplicate content issue be hurting visibility for both sites? In other words, can I expect a boost in rankings with the canonical tags in place? Or will rankings remain the same?
Intermediate & Advanced SEO | | AmyLB0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0