How do I use public content without being penalized for duplication?

KempRugeLawGroup

The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized?

Thanks,

Ruben

KempRugeLawGroup

I didn't think about other sites, but that's a fabulous point. Best to play it safe.

Thanks for your input!

Ruben

MarieHaynes

My gut says that your idea to keep the content noindexed is best. Even if the content is unique it borders on the area of auto-generated. Now, I might change your mind if there were a lot of users that were actively interacting with most of these pages. If not though, then you'll end up having a large portion of your site consisting of auto-generated content that doesn't seem useful. Plus...it's also possible that other sites are using this information so you would end up having content that is duplicated on other sites too.

I could be wrong, but my gut says not to try to use this content for ranking purposes.

KempRugeLawGroup

I appreciate the follow up, Marie. Please give me your thoughts on the following idea;

The NHTSA only posts the updates for the past month. If I noindex the page for now (which is what I'm doing) and wait five months, then what would happen? At that point, yes, the current month would be duplicated, but I'd have four months of "unique" content because the NHTSA deletes there's. Plus, I could add pictures of all the automobile, too. Do you think that would be enough to index it?

(I'm most likely going to keep it noindex, because this borders on shady, or at least, I could see google taking it that way, but just as a thought experiment, what do you think?) Or anyone else?

Thanks,

Ruben

MarieHaynes

To expand on EGOL's answer, if you are taking someone else's content (even with their permission) and wanting Google to index it then Google can see that you have a large amount of copied content on your site. This can trigger the Panda filter and can cause Google to consider your whole site as low quality.

You can add a noindex tag as EGOL suggested or you could use a canonical tag to show Google who the originator of the content is, but probably the noindex tag is easiest.

There is one other option as well. If you think it is possible that you can add significant value to the content that is being provided then you can still keep it indexed. If you can combine the recall information with other valuable information then that might be ok to index. But, you have to truly be providing value, not just padding the page with words to make it look unique.

KempRugeLawGroup

Alright, that sounds good. Thanks!

EGOL

I had a bunch of republished articles on my website, done mostly at the request of government agencies and universities. That site got hit in one of the early Panda updates. So, I deleted a lot of that content and to the rest I added this line above the tag

name="robots" content="noindex, follow" />

That tells google not to index the page but to follow the links and allow pagerank to flow through. My site recovered a few weeks later.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How do I use public content without being penalized for duplication?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Same site serving multiple countries and duplicated content

Use hreflang on links without rel alternative?

What is considered duplicate content?

Duplicate Page Content Issues Reported in Moz Crawl Report

Category Pages For Distributing Authority But Not Creating Duplicate Content

How to Avoid Duplicate Content Issues with Google?

Wordpress and duplicate content

Managing Large Regulated or Required Duplicate Content Blocks