Duplicate content mess

Alex-Harford

One website I'm working with keeps a HTML archive of content from various magazines they publish. Some articles were repeated across different magazines, sometimes up to 5 times. These articles were also used as content elsewhere on the same website, resulting in up to 10 duplicates of the same article on one website.

With regards to the 5 that are duplicates but not contained in the magazine, I can delete (resulting in 404) all but the highest value of each (most don't have any external links). There are hundreds of occurrences of this and it seems unfeasible to 301 or noindex them.

After seeing how their system works I can canonical the remaining duplicate that isn't contained in the magazine to the corresponding original magazine version - but I can't canonical any of the other versions in the magazines to the original. I can't delete the other duplicates as they're part of the content of a particular issue of a magazine. The best thing I can think of doing is adding a link in the magazine duplicates to the original article, something along the lines of "This article originally appeared in...", though I get the impression the client wouldn't want to reveal that they used to share so much content across different magazines.

The duplicate pages across the different magazines do differ slightly as a result of the different Contents menu for each magazine.

Do you think it's a case of what I'm doing will be better than how it was, or is there something further I can do? Is adding the links enough?

Thanks.

Alex-Harford

You're right about the 301s, and noindex would be a massive task that I'm not sure is worthwhile. Also I'm not sure if I want to list hundreds of pages in robots.txt.

By "back to back" do you mean "compare link metrics"? A lot of these pages show as "No Data Available for this URL" some of them are quite deep down within the site, so I don't know if that's why or if Mozscape can tell that they're duplicate content. The articles that are not part of the magazines usually seem to have a PA of 30+ judging by my spot-checks, but even some of those duplicated from magazine articles (and outside of the magazines) have no data available despite being easier to crawl than the magazine content.

Unity

If adding meta tags, redirects etc to all of the pages is too labor intensive and the return from any SEO goodness those pages is low, then perhaps you could just block search engines access to certain sections of the website via robots.txt file.

danatanseo

Given the way Alex describes the separate magazines, I am thinking they wouldn't like having the 301-redirects from a branding perspective. I like the idea of adding an attribution link to the original article. I have doubts about the "noindex" because I think that in many cases Google completely ignores this attribute. I'm not sure that's worth going through all the trouble of doing.

Have you tried putting the "duplicates" back to back in Open Site Explorer? I am really curious to know what that looks like.

OlegKorneitchouk

Instead of deleting, you can just noindex + add a link to the original article.
Instead of deleting, you can 301 redirect to the original article.

This removes all duplicate content issues.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate content mess

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Geo-Targeted Sub-Domains & Duplicate Content/Canonical

Duplicate content on URL trailing slash

Webmaster is giving errors of Duplicate Meta Descriptions and Duplicate Title Tags

SEO effect of content duplication across hub of sites

Does > help Google to see content as a citation and not a duplicate?

Robots.txt & Duplicate Content

International SEO - cannibalisation and duplicate content

Can videos be considered duplicate content?