Duplicate content that looks unique
-
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on?
http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011
http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
-
Good question, because those pages look different to a human. The SEOmoz web app uses a similarity threshold of 95% of the html code. This takes everything on the page, both hidden and visible into account.
In this case, it's counting all of the navigation and sidebar as well, which is significant. What's left of the unique content - the part that matters, makes up less than 5% of the code.
Here's a tool you can use to check the similarity: http://www.duplicatecontent.net/
I ran the pages through a couple of tools which showed 96% HTML similarity.
(but only a 92% text similarity - which is good, but not great)
For perspective, take a look at Google's cached versions of one of these pages. This is how googlebot sees the page: http://webcache.googleusercontent.com/search?q=cache:4fKrbNTUnegJ:www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone+http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone&hl=en&gl=us&strip=1G
Since Panda, when I see a site with this many navigation links, I usually advise them to restructure their site architecture into more of a Pyramid shape, so that you reduce the overall navigation on each page.
There are 2 ways to look at this: First of all, Google is much more sophisticated than SEOmoz at detecting duplicate content, and they are also better at contextual analysis - so they can probably tell these are not true duplicates.
Hope this helps! Best of luck with your SEO.
-
SEOmoz looks at the code on the page when it looks at duplicate content scores. My hunch is that there's a lot of identical code on those pages, which is causing the warning.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Glossary index and individual pages create duplicate content. How much might this hurt me?
I've got a glossary on my site with an index page for each letter of the alphabet that has a definition. So the M section lists every definition (the whole definition). But each definition also has its own individual page (and we link to those pages internally so the user doesn't have to hunt down the entire M page). So I definitely have duplicate content ... 112 instances (112 terms). Maybe it's not so bad because each definition is just a short paragraph(?) How much does this hurt my potential ranking for each definition? How much does it hurt my site overall? Am I better off making the individual pages no-index? or canonicalizing them?
Intermediate & Advanced SEO | | LeadSEOlogist0 -
How to handle duplicate content with Bible verses
Have a friend that does a site with bible verses and different peoples thoughts or feelings on them. Since I'm an SEO he came to me with questions and duplicate content red flag popped up in my head. My clients all generate their own content so not familiar with this world. Since Bible verses appear all over the place, is there a way to address this from an SEO standpoint to avoid duplicate content issues? Thanks in advance.
Intermediate & Advanced SEO | | jeremyskillings0 -
Parameter Strings & Duplicate Page Content
I'm managing a site that has thousands of pages due to all of the dynamic parameter strings that are being generated. It's a real estate listing site that allows people to create a listing, and is generating lots of new listings everyday. The Moz crawl report is continually flagging A LOT (25k+) of the site pages for duplicate content due to all of these parameter string URLs. Example: sitename.com/listings & sitename.com/listings/?addr=street name Do I really need to do anything about those pages? I have researched the topic quite a bit, but can't seem to find anything too concrete as to what the best course of action is. My original thinking was to add the rel=canonical tag to each of the main URLs that have parameters attached. I have also read that you can bypass that by telling Google what parameters to ignore in Webmaster tools. We want these listings to show up in search results, though, so I don't know if either of these options is ideal, since each would cause the listing pages (pages with parameter strings) to stop being indexed, right? Which is why I'm wondering if doing nothing at all will hurt the site? I should also mention that I originally recommend the rel=canonical option to the web developer, who has pushed back in saying that "search engines ignore parameter strings." Naturally, he doesn't want the extra work load of setting up the canonical tags, which I can understand, but I want to make sure I'm both giving him the most feasible option for implementation as well as the best option to fix the issues.
Intermediate & Advanced SEO | | garrettkite0 -
Best strategy for duplicate content?
Hi everyone, We have a site where all product pages have more or less similar text (same printing techniques, etc.) The main differences are prices and images, text is highly similar. We have around 150 products in every language. Moz's algorithm tells me to do something about duplicate content, but I don't really know what we could do, since the descriptions can't be changed to be very different. We essentially have paper bags in different colors and and from different materials.
Intermediate & Advanced SEO | | JaanMSonberg0 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
Best practice with duplicate content. Cd
Our website has recently been updated, now it seems that all of our products pages look like this cdnorigin.companyname.com/catagory/product Google is showing these pages within the search. rather then companyname.com/catagory/product Each product page does have a canaonacal tag on that points to the cdnorigin page. Is this best practice? i dont think that cdnorigin.companyname etc looks very goon in the search. is there any reason why my designer would set the canonical tags up this way?
Intermediate & Advanced SEO | | Alexogilvie0 -
Google Translate for Unique Content
We are considering using the Google Translation tool to translate customer reviews into various languages for publication as indexable content both for users and for search engine long tail visibility and rankings. Does anyone have any experience, insights or caveats to share?
Intermediate & Advanced SEO | | edreamsbcn0 -
Is this duplicate content something to be concerned about?
On the 20th February a site I work on took a nose-dive for the main terms I target. Unfortunately I can't provide the url for this site. All links have been developed organically so I have ruled this out as something which could've had an impact. During the past 4 months I've cleaned up all WMT errors and applied appropriate redirects wherever applicable. During this process I noticed that mydomainname.net contained identical content to the main mydomainname.com site. Upon discovering this problem I 301 redirected all .net content to the main .com site. Nothing has changed in terms of rankings since doing this about 3 months ago. I also found paragraphs of duplicate content on other sites (competitors in different countries). Although entire pages haven't been copied there is still enough content to highlight similarities. As this content was written from scratch and Google would've seen this within it's crawl and index process I wanted to get peoples thoughts as to whether this is something I should be concerned about? Many thanks in advance.
Intermediate & Advanced SEO | | bfrl0