Duplicate content that looks unique
-
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on?
http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011
http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
-
Good question, because those pages look different to a human. The SEOmoz web app uses a similarity threshold of 95% of the html code. This takes everything on the page, both hidden and visible into account.
In this case, it's counting all of the navigation and sidebar as well, which is significant. What's left of the unique content - the part that matters, makes up less than 5% of the code.
Here's a tool you can use to check the similarity: http://www.duplicatecontent.net/
I ran the pages through a couple of tools which showed 96% HTML similarity.
(but only a 92% text similarity - which is good, but not great)
For perspective, take a look at Google's cached versions of one of these pages. This is how googlebot sees the page: http://webcache.googleusercontent.com/search?q=cache:4fKrbNTUnegJ:www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone+http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone&hl=en&gl=us&strip=1G
Since Panda, when I see a site with this many navigation links, I usually advise them to restructure their site architecture into more of a Pyramid shape, so that you reduce the overall navigation on each page.
There are 2 ways to look at this: First of all, Google is much more sophisticated than SEOmoz at detecting duplicate content, and they are also better at contextual analysis - so they can probably tell these are not true duplicates.
Hope this helps! Best of luck with your SEO.
-
SEOmoz looks at the code on the page when it looks at duplicate content scores. My hunch is that there's a lot of identical code on those pages, which is causing the warning.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Identifying Duplicate Content
Hi looking for tools (beside Copyscape or Grammarly) which can scan a list of URLs (e.g. 100 pages) and find duplicate content quite quickly. Specifically, small batches of duplicate content, see attached image as an example. Does anyone have any suggestions? Cheers. 5v591k.jpg
Intermediate & Advanced SEO | | jayoliverwright0 -
Is all duplication of HTML title content bad?
In light of Hummingbird and that HTML titles are the main selling point in SERPs, is my approach to keyword rich HTML titles bad? Where possible I try to include the top key phrase to descripe a page and then a second top keyphrase describing what the company/ site as a whole is or does. For instance an estate agents site could consist of HTML title such as this Buy Commercial Property in Birmingham| Commercial Estate Agents Birmingham Commercial Property Tips | Commercial Estate Agents In order to preserve valuable characters I have also been omitting brand names other than on the home page... is this also poor form?
Intermediate & Advanced SEO | | SoundinTheory0 -
3rd Party hosted whitepapers — bad idea? Duplicate content?
It is common the B2B world to have 3rd parties host your whitepapers for added exposure. Is this a bad practice from an SEO point of view? Is the expectation that the 3rd parties use rel=canonical tags? I doubt most of them do . . .
Intermediate & Advanced SEO | | BlueLinkERP0 -
Product descriptions & Duplicate Content: between fears and reality
Hello everybody, I've been reading quite a lot recently about this topic and I would like to have your opinion about the following conclusion: ecommerce websites should have their own product descriptions if they can manage it (it will be beneficial for their SERPs rankings) but the ones who cannot won't be penalized by having the same product descriptions (or part of the same descriptions) IF it is only a "small" part of their content (user reviews, similar products, etc). What I mean is that among the signals that Google uses to guess which sites should be penalized or not, there is the ratio "quantity of duplicate content VS quantity of content in the page" : having 5-10 % of a page text corresponding to duplicate content might not be harmed while a page which has 50-75 % of a content page duplicated from an other site... what do you think? Can the "internal" duplicated content (for example 3 pages about the same product which is having 3 diferent colors -> 1 page per product color) be considered as "bad" as the "external" duplicated content (same product description on diferent sites) ? Thanks in advance for your opinions!
Intermediate & Advanced SEO | | Kuantokusta0 -
Are all duplicate content issues bad? (Blog article Tags)
If so how bad? We use tags on our blog and this causes duplicate content issues. We don't use wordpress but with such a highly used cms having the same issue it seems quite plausible that Google would be smart enough to deal with duplicate content issues caused by blog article tags and not penalise at all. Here it has been discussed and I'm ready to remove tags from our blog articles or monitor them closely to see how it effects our rankings. Before I do, can you give me some advice around this? Thanks,
Intermediate & Advanced SEO | | Daniel_B
Daniel.0 -
Is an RSS feed considered duplicate content?
I have a large client with satellite sites. The large site produces many news articles and they want to put an RSS feed on the satellite sites that will display the articles from the large site. My question is, will the rss feeds on the satellite sites be considered duplicate content? If yes, do you have a suggestion to utilize the data from the large site without being penalized? If no, do you have suggestions on what tags should be used on the satellite pages? EX: wrapped in tags? THANKS for the help. Darlene
Intermediate & Advanced SEO | | gXeSEO0 -
Duplicate Content / 301 redirect Ariticle issue
Hello, We've got some articles floating around on our site nlpca(dot)com like this article: http://www.nlpca.com/what-is-dynamic-spin-release.html that's is not linked to from anywhere else. The article exists how it's supposed to be here: http://www.dynamicspinrelease.com/what-is-dsr/ (our other website) Would it be safe in eyes of both google's algorithm (as much as you know) and with Panda to just 301 redirect from http://www.nlpca.com/what-is-dynamic-spin-release.html to http://www.dynamicspinrelease.com/what-is-dsr/ or would no-indexing be better? Thank you!
Intermediate & Advanced SEO | | BobGW0 -
Multiple cities/regions websites - duplicate content?
We're about to launch a second site for a different, neighbouring city in which we are going to setup a marketing campaign to target sales in that city (which will also have a separate office there as well). We are going to have it under the same company name, but different domain name and we're going to do our best to re-write the text content as much as possible. We want to avoid Google seeing this as a duplicate site in any way, but what about: the business name the toll free number (which we would like to have same on both sites) the graphics/image files (which we would like to have the same on both sites) site structure, coding styles, other "forensic" items anything I might not be thinking of... How are we best to proceed with this? What about cross-linking the sites?
Intermediate & Advanced SEO | | webdesignbarrie0