How do I use public content without being penalized for duplication?
-
The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized?
Thanks,
Ruben
-
I didn't think about other sites, but that's a fabulous point. Best to play it safe.
Thanks for your input!
- Ruben
-
My gut says that your idea to keep the content noindexed is best. Even if the content is unique it borders on the area of auto-generated. Now, I might change your mind if there were a lot of users that were actively interacting with most of these pages. If not though, then you'll end up having a large portion of your site consisting of auto-generated content that doesn't seem useful. Plus...it's also possible that other sites are using this information so you would end up having content that is duplicated on other sites too.
I could be wrong, but my gut says not to try to use this content for ranking purposes.
-
I appreciate the follow up, Marie. Please give me your thoughts on the following idea;
The NHTSA only posts the updates for the past month. If I noindex the page for now (which is what I'm doing) and wait five months, then what would happen? At that point, yes, the current month would be duplicated, but I'd have four months of "unique" content because the NHTSA deletes there's. Plus, I could add pictures of all the automobile, too. Do you think that would be enough to index it?
(I'm most likely going to keep it noindex, because this borders on shady, or at least, I could see google taking it that way, but just as a thought experiment, what do you think?) Or anyone else?
Thanks,
Ruben
-
To expand on EGOL's answer, if you are taking someone else's content (even with their permission) and wanting Google to index it then Google can see that you have a large amount of copied content on your site. This can trigger the Panda filter and can cause Google to consider your whole site as low quality.
You can add a noindex tag as EGOL suggested or you could use a canonical tag to show Google who the originator of the content is, but probably the noindex tag is easiest.
There is one other option as well. If you think it is possible that you can add significant value to the content that is being provided then you can still keep it indexed. If you can combine the recall information with other valuable information then that might be ok to index. But, you have to truly be providing value, not just padding the page with words to make it look unique.
-
Alright, that sounds good. Thanks!
-
I had a bunch of republished articles on my website, done mostly at the request of government agencies and universities. That site got hit in one of the early Panda updates. So, I deleted a lot of that content and to the rest I added this line above the tag
name="robots" content="noindex, follow" />
That tells google not to index the page but to follow the links and allow pagerank to flow through. My site recovered a few weeks later.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Different language with direct translation: duplicate content, meta?
For a site that does NOT want a separate subdomain, or directory, or TLD for a country/language would the directly translated page (static) content/meta be duplicate? (NOT considering a translation of the term/acronym which could exist in another language) i.e. /SEO-city-state in English vs. /SEO-city-state Spanish -In this example a term/acronym that is the same in any language. Outside of duplicate content, are their other conflict potentials in rankings you can think of?
Intermediate & Advanced SEO | | bozzie3110 -
Magento products and eBay - duplicate content risk?
Hi, We are selling about 1000 sticker products in our online store and would like to expand a large part of our products lineup to eBay as well. There are pretty good modules for this as I've heard. I'm just wondering if there will be duplicate content problems if I sync the products between Magento and eBay and they get uploaded to eBay with identical titles, descriptions and images? What's the workaround in this case? Thanks!
Intermediate & Advanced SEO | | speedbird12290 -
Can a website be punished by panda if content scrapers have duplicated content?
I've noticed recently that a number of content scrapers are linking to one of our websites and have the duplicate content on their web pages. Can content scrapers affect the original website's ranking? I'm concerned that having duplicated content, even if hosted by scrapers, could be a bad signal to Google. What are the best ways to prevent this happening? I'd really appreciate any help as I can't find the answer online!
Intermediate & Advanced SEO | | RG_SEO0 -
News section of the website (Duplicate Content)
Hi Mozers One of our client wanted to add a NEWS section in to their website. Where they want to share the latest industry news from other news websites. I tried my maximum to understand them about the duplicate content issues. But they want it badly What I am planning is to add rel=canonical from each single news post to the main source websites ie, What you guys think? Does that affect us in any ways?
Intermediate & Advanced SEO | | riyas_heych0 -
How to Remove Joomla Canonical and Duplicate Page Content
I've attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.
Intermediate & Advanced SEO | | deskstudio0 -
Duplicate content: is it possible to write a page, delete it and use it for a different site?
Hi, I've a simple question. Some time ago I built a site and added pages to it. I have found out that the site was penalized by Google and I have neglected it. The problem is that I had written well-optimized pages on that site, which I would like to use on another website. Thus, my question is: if I delete a page I had written on site 1, can use it on page 2 without being penalized by Google due to duplicate content? Please note: site one would still be online. I will simply delete some pages and use them on site 2. Thank you.
Intermediate & Advanced SEO | | salvyy0 -
Why duplicate content for same page?
Hi, My SEOMOZ crawl diagnostic warn me about duplicate content. However, to me the content is not duplicated. For instance it would give me something like: (URLs/Internal Links/External Links/Page Authority/Linking Root Domains) http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110516 /1/1/31/2 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110711 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110811 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110911 0/0/1/0 Why is this seen as duplicate content when it is only URL with campaign tracking codes to the same content? Do I need to clean this?Thanks for answer
Intermediate & Advanced SEO | | nuxeo0 -
How to manage duplicate content?
I have a real estate site that contains a large amount of duplicate content. The site contains listings that appear both on my clients website and on my competitors websites(who have better domain authority). It is critical that the content is there because buyers need to be able to find these listings to make enquiries. The result is that I have a large number pages that contain duplicate content in some way, shape or form. My search results pages are really the most important ones because these are the ones targeting my keywords. I can differentiate these to some degree but the actual listings themselves are duplicate. What strategies exist to ensure that I'm not suffereing as a result of this content? Should I : Make the duplicate content noindex. Yes my results pages will have some degree of duplicate content but each result only displays a 200 character summary of the advert text so not sure if that counts. Would reducing the amount of visible duplicate content improve my rankings as a whole? Link back to the clients site to indicate that they are the original source Any suggestions?
Intermediate & Advanced SEO | | Mulith0