Best Way to Handle Near-Duplicate Content?
-
Hello Dear MOZers,
Having duplicate content issues and I'd like some opinions on how best to deal with this problem.
Background: I run a website for a cosmetic surgeon in which the most valuable content area is the section of before/after photos of our patients. We have 200+ pages (one patient per page) and each page has a 'description' block of text and a handful of before and after photos. Photos are labeled with very similar labels patient-to-patient ("before surgery", "after surgery", "during surgery" etc). Currently, each page has a unique rel=canonical tag. But MOZ Crawl Diagnostics has found these pages to be duplicate content of each other. For example, using a 'similar page checker' two of these pages were found to be 97% similar.
As far as I understand there are a few ways to deal with this, and I'd like to get your opinions on the best course.
-
Add 150+ more words to each description text block
-
Prevent indexing of patient pages with robots.txt
-
Set the rel=canonical for each patient page to the main gallery page
-
Any other options or suggestions?
Please keep in mind that this is our most valuable content, so I would be reluctant to make major structural changes, or changes that would result in any decrease in traffic to these pages.
Thank you folks,
Ethan
-
-
Thank you for the response Marie. My main concern at the moment is seo because the content was flagged as duplicate in MOZ Crawl Diagnostics, and I want to avoid being penalized for duplicate content. Still, I appreciate the comments on performance vs. seo. Thanks again.
-
My answer to this question would depend on how well this content is being digested by visitors to your site. My concern wouldn't be so much with duplicate content but rather, with the potential for thin content.
Let's say that 5% of these pages are received well and 95% of this is content that is almost never engaged with. Then, I'd want to be doing something to get some of this content out of Google's index. But, let's say that almost all of these pages were getting Google visits. If that were the case then I'd keep them just as they are.
I wouldn't add text to these just to try to make them look like they're not duplicate content. That's not likely to add value to users. One possible solution that could work is to group these into categories if possible and instead of indexing, say, 10 individual pages, you could have 10 before and after photos on one page. If you do this, be sure to redirect the old urls to their new category page.
There are other solutions as well such as noindexing the pages that rarely get Google traffic or hiding them behind a robots block. But to me the answers would really depend on how much Google traffic they are currently getting.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content warning for a hierarchy structure?
I have a series of pages on my website organized in a hierarchy, let's simplify it to say parent pages and child pages. Each of the child pages has product listings, and an introduction at the top (along with an image) explaining their importance, why they're grouped together, providing related information, etc.
Technical SEO | | westsaddle
The parent page has a list of all of its child pages and a copy of their introductions next to the child page's title and image thumbnail. Moz is throwing up duplicate content warnings for all of these pages. Is this an actual SEO issue, or is the warning being overzealous?
Each child page has tons of its own content, and each parent page has the introductions from a bunch of child pages, so any single introduction is never the only content on the page. Thanks in advance!0 -
WordPress Duplicate Content Caused By Categories
Hello, We have a wordpress blog that has around 250 categories. Due to our platform we have a hierarchy structure for 3 separate stores. For example iPhone > Apps > Books. Placing a blog post in the books category automatically places it into iPhone and iPhone/Apps category, causing 3 instances of any blog post in this category. Is this an issue? I have seen 2 schools of thought on categories, 1 index follow and 2 noindex follow. I know some of our categories get indexed, but with so many, maybe it is better to noindex them. We also considered reducing our categories to 10 to 12 and use tags to provide the indexed site navigation as follows: Reviews (category) iPhone Book App, iPhone App Store (tags) but this seems a little redundant? Anyone want to take this on? thank you Mike
Technical SEO | | crazymikesapps10 -
What online tools are best to identify website duplicate content (plagiarism) issues?
I've discovered that one of the sites I am working on includes content which also appears on number of other sites. I need to understand exactly how much of the content is duplicated so I can replace it with unique copy. To do this I have tried using tools such as plagspotter.com and copyscape.com with mixed results, nothing so far is able to give me a reliable picture of exactly how much of my existing website content is duplicated on 3rd party sites. Any advice welcome!
Technical SEO | | HomeJames0 -
Duplicate content on report
Hi, I just had my Moz Campaign scan 10K pages out of which 2K were duplicate content and URL's are http://www.Somesite.com/modal/register?destination=question%2F37201 http://www.Somesite.com/modal/register?destination=question%2F37490 And the title for all 2K is "Register" How can i deal with this as all my pages have the register link and login and when done it comes back to the same page where we left and that it actually not duplicate but we need to deal with it propely thanks
Technical SEO | | mtthompsons0 -
Best way to host new product?
Hi guys We are launching a new product, the web pages are being built by a 3rd party and fall outside our current CMS. We're considering either hosting it on 1) sub domain 2) folder within existing site (although will be tricky to implement) or 3) a different URL altogether. What would you say is the best for SEO? Many thanks in advance.... Nigel
Technical SEO | | Richard5550 -
Duplicate content - Quickest way to recover?
We've recently been approached by a new client who's had a 60%+ drop in organic traffic. One of the major issues we found was around 60k+ pages of content duplicated across 3 seperate domains. After much discussion and negotiation with them; we 301'd all the pages across to the best domain but traffic is increasing very slowly. Given that the old sites are 60k+ pages each and don't get crawled very often, is it best to notify the domain change through Google Webmaster Tools to try and give Google a 'nudge' to deindex the old pages and hopefully recover from the traffic loss as quickly and as much as possible?
Technical SEO | | Nathan.Smith0 -
Duplicate Content
Hi - We are due to launch a .com version of our site, with the ability to put prices into local currency, whereas our .co.uk site will be solely £. If the content on both the .com and .co.uk sites is the same (at product level mainly), will we be penalised? What is the best way to get around this?
Technical SEO | | swgolf1230 -
Duplicate Content Question
Just signed up for pro and did my first diagnostic check - I came back with something like 300 duplicate content errors which suprised me because every page is unique. Turns out my pages are listed as www.sportstvjobs.com and just sportstvjobs.com does that really count as duplicate? and if so does anyone know what I should be doing differently? I thought it was just a canonical issue, but best I can tell I have the canonical in there but this still came up as a duplicate error....maybe I did canonical wrong, or its some other issue? Thanks Brian Clapp
Technical SEO | | sportstvjobs0