Best Way to Handle Near-Duplicate Content?

BernsteinMedicalNYC

Hello Dear MOZers,

Having duplicate content issues and I'd like some opinions on how best to deal with this problem.

Background: I run a website for a cosmetic surgeon in which the most valuable content area is the section of before/after photos of our patients. We have 200+ pages (one patient per page) and each page has a 'description' block of text and a handful of before and after photos. Photos are labeled with very similar labels patient-to-patient ("before surgery", "after surgery", "during surgery" etc). Currently, each page has a unique rel=canonical tag. But MOZ Crawl Diagnostics has found these pages to be duplicate content of each other. For example, using a 'similar page checker' two of these pages were found to be 97% similar.

As far as I understand there are a few ways to deal with this, and I'd like to get your opinions on the best course.

Add 150+ more words to each description text block
Prevent indexing of patient pages with robots.txt
Set the rel=canonical for each patient page to the main gallery page
Any other options or suggestions?

Please keep in mind that this is our most valuable content, so I would be reluctant to make major structural changes, or changes that would result in any decrease in traffic to these pages.

Thank you folks,

Ethan

BernsteinMedicalNYC

Thank you for the response Marie. My main concern at the moment is seo because the content was flagged as duplicate in MOZ Crawl Diagnostics, and I want to avoid being penalized for duplicate content. Still, I appreciate the comments on performance vs. seo. Thanks again.

MarieHaynes

My answer to this question would depend on how well this content is being digested by visitors to your site. My concern wouldn't be so much with duplicate content but rather, with the potential for thin content.

Let's say that 5% of these pages are received well and 95% of this is content that is almost never engaged with. Then, I'd want to be doing something to get some of this content out of Google's index. But, let's say that almost all of these pages were getting Google visits. If that were the case then I'd keep them just as they are.

I wouldn't add text to these just to try to make them look like they're not duplicate content. That's not likely to add value to users. One possible solution that could work is to group these into categories if possible and instead of indexing, say, 10 individual pages, you could have 10 before and after photos on one page. If you do this, be sure to redirect the old urls to their new category page.

There are other solutions as well such as noindexing the pages that rarely get Google traffic or hiding them behind a robots block. But to me the answers would really depend on how much Google traffic they are currently getting.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best Way to Handle Near-Duplicate Content?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Does adding a noindex tag reduce duplicate content?

How to deal with duplicated content on product pages?

Duplicate Content Question

Duplicate Content - Reverse Phone Directory

Duplicate page content - index.html

Determining where duplicate content comes from...

Duplicate Content

404 handling the right way