Duplicate content

jwdl

I run about 10 sites and most of them seemed to fall foul of the penguin update and even though I have never sought inorganic links I have been frantically searching for a link based answer since April.

However since asking a question here I have been pointed in another direction by one of your contributors. It seems At least 6 of my sites have duplicate content issues.

If you search Google for "We have selected nearly 200 pictures of short haircuts and hair styles in 16 galleries" which is the first bit of text from the site short-hairstyles.com about 30000 results appear. I don't know where they're from nor why anyone would want to do this. I presume its automated since there is so much of it.

I have decided to redo the content. So I guess (hope) at some point in the future the duplicate nature will be flushed from Google's index?

But how do I prevent it happening again? It's impractical to redo the content every month or so.

For example if you search for "This facility is written in Flash to use it you need to have Flash installed." from another of my sites that I coincidently uploaded a new page to a couple of days ago, only the duplicate content shows up not my original site. So whoever is doing this is finding new stuff on my site and getting it indexed on google before even google sees it on my site!

Thanks,

Ian

AlanBleiweiss

I don't have any experience with Cloudflare so I can't offer an opinion on their services. And without a proper audit of your site and link profile, there is no honest way to know exactly what the core issues are on the site. Short of a proper audit, it's all a guess. That's the bigger concern.

Maybe it's links. Maybe its duplicate content perception. Maybe it's a dozen seemingly insignificant issues that accumulated to the breaking point with a trigger event like Penguin.

Unfortunately that's the reality of SEO in 2012.

jwdl

ok, maybe I'm not getting something or not explaining myself properly.

When I say things like "30000 times", "every page" and "it is the majority of the content" in the context that I have in my head I'm saying its not a trivial thing and I have looked into it at length.

If you thought there was some verification needed to answer the question the information is there to have a look.

Complex things are made up of lots of uncomplex things.

How strong is this site? Up until April I'd say very strong, it came in at number 1 for several high volume keywords (still does in bing and yahoo)

As I said in the original question I have decided to redo most of the content on this site anyway so whether this whole issue is an issue or not isn't an issue.

The original question was how do you prevent it happening again? Is rel author rel-publisher and g+ the answer?

or what about this? http://www.cloudflare.com/plans

AlanBleiweiss

"it is the majority of my content". that's what I asked originally - if it is the majority of content on individual pages. If that's true, it could be a cause of problems, however SEO is an extremely complex process with multiple algorithms so unfortunately, without a detailed review of the site, it's dangerous to assume that specific issue is the cause of your problems.

How strong is your site in other regards? Do you implement rel-author or rel-publisher code and tie it to a Google+ account to communicate you're the original source? Do you have enough other trust signals in place? There are many other similar questions that need to be answered before anyone can confidently make serious recommendations.

jwdl

1. Google doesn't seem to know this and has penalised my sites for something.

2. It is the majority of the content. Its pretty much all of it, upto 30000 times.

3. I've lost 70% of my traffic via recent Google updates. That is THE over whelming concern which is why I came and joined this site.

I arrived at this point by asking this question http://www.seomoz.org/q/penguin-issues if you disagree with the track I got sent on can you suggest a different one?

AlanBleiweiss

1. you're not generating the duplicate content so there's nothing you can logically do about on any kind of a scalable frequency, let alone prevent.

2. If it's not the majority of content on a page, it's not a serious problem. In fact, it's common to the internet.

3. Don't allow non-issues become an overwhelming concern. Focus on what you can do something about, and things that are more important and really do have a negative impact on your SEO that are within you control.

jwdl

OK but the snippet is an exact match (in speech marks) and there's 30000 of them that's not just monkeys typing Shakespeare. Every page (300 or so) on that site has unique content and more or less each page has upto 30000 duplicates, most a lot less that 30000 but a lot more that 1, which it should be. If there was a couple of coincidences, fine, but there's not.

AlanBleiweiss

Just finding a snippet that's as short as the examples you gave is not a reason to be concerned about duplicate content in itself. A typical page should have hundreds of words and rank for whatever phrase or phrases you care about, not for a single sentence within the content.

If, on the other hand, you have the overwhelming majority of the content from one of your pages duplicated, that's a reason to be concerned.

So - how much content do you have on YOUR site on the page(s) in question? And have you checked to find out if the majority is duplicated? That's where the focus needs to be.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Are backlinks within duplicate content ignored or devalued?

How can a website have multiple pages of duplicate content - still rank?

Category Pages For Distributing Authority But Not Creating Duplicate Content

PDF for link building - avoiding duplicate content

About robots.txt for resolve Duplicate content

Duplicate Content Error because of passed through variables

I try to apply best duplicate content practices, but my rankings drop!

Help With Preferred Domain Settings, 301 and Duplicate Content