Is legacy duplicate content an issue?

Fammy

I am looking for some proof, or at least evidence to whether or not sites are being hurt by duplicate content.

The situation is, that there were 4 content rich newspaper/magazine style sites that were basically just reskins of each other. [ a tactic used under a previous regime ] The least busy of the sites has since been discontinued & 301d to one of the others, but the traffic was so low on the discontinued site as to be lost in noise, so it is unclear if that was any benefit.

Now for the last ~2 years all the sites have had unique content going up, but there are still the archives of articles that are on all 3 remaining sites, now I would like to know whether to redirect, remove or rewrite the content, but it is a big decision - the number of duplicate articles? 263,114 !

Is there a chance this is hurting one or more of the sites? Is there anyway to prove it, short of actually doing the work?

Fammy

Hi Jen

We are in the fortunate/crazy situation where we have a custom CMS so the actual redirects are not really a problem from a technical standpoint, it is just wondering if we should

The main site - the biggest and busiest - has a discussion board and a shop, and a blog which the others don't so the articles are about 10% of the indexed content, and about 11% are unique.. the other 2 sites, one has 0.003% unique articles and the other 1.829% ... sounds pretty bad when I put it like that!

We haven't seen a noticeable dip, just general disappointing performance, I think I will try and rope someone into doing a full CSI on the data

Have you seen anywhere that has recovered from a comparable situation? The pondering at this end was that the damage was already done, and that was that.

thanks

AngelDigital

Hi Fammy!

One thing you could do is to look at the dates the Panda updates hit (http://moz.com/google-algorithm-change) against your website traffic for those dates. If you see a dip, you probably got hit.

If not, it's still possible that the duplicate content is holding back your visibility in the SERPs. You can sometimes guess this when you're adding new content and it doesn't really perform as you'd expect it to - but unfortunately, you won't know for sure until you take some action.

Another thing to keep in mind is that you risk getting hit in the future - for example, by a manual penalty - which could even result in the sites being removed.

263,114 is a huge number of duplicate articles and I was just wondering what proportion that is to your overall number of site pages. If it is quite a high percentage, the risk is obviously greater.

I'd recommend you take some action personally. Is there any pattern in the way the archive of articles is structured, to make it possible to write a catch-all 301 rule in your htaccess file that redirects them all to one of the three sites?

For example say your archived articles site in a folder called archive - you'd put this in the htaccess on sites 1 and 2:

RewriteEngine on

RewriteBase

RewriteRule ^archive/(.*)$ http://www.yoursite3.com/archive/$1 [R=301,L]

... and this would redirect anything in the archive directory to the archive directory on site 3, assuming the file names are exactly the same.

Alternatively if that's not an option, you could look at which of the articles have decent links going to them on sites 1 and 2, redirect those to chosen site 3 and remove the rest, cutting the workload down a little.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Is legacy duplicate content an issue?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Defining duplicate content

Removing duplicate content

Duplicate content on yearly product models.

Duplicate content across internation urls

Multiple cities/regions websites - duplicate content?

I try to apply best duplicate content practices, but my rankings drop!

Subdomains - duplicate content - robots.txt

SEO issues with IP based content delivery