ROI on Policing Scraped Content
-
Over the years, tons of original content from my website (written by me) has been scraped by 200-300 external sites. I've been using Copyscape to identify the offenders. It is EXTREMELY time consuming to identify the site owners, prepare an email with supporting evidence (screen shots), and following up 2, 3, 15 times until they remove the scraped content. Filing DMCA takedowns are a final option for sites hosted in the US, but quite a few of the offenders are in China, India, Nigeria, and other places not subject to DMCA. Sometimes, when a site owner takes down scraped content, it reappears a few months or years later. It's exasperating.
My site already performs well in the SERPs - I'm not aware of a third party site's scraped content outperforming my site for any search phrase.
Given my circumstances, how much effort do you think I should continue to put into policing scraped content?
-
I watch my traffic increases and decreases. You can do that with google analytics. I do it with clicky. When I see an important page show traffic losses, I go looking.
One of my retail sites suddenly was not selling a certain product category very well. I looked into it and hundreds of "made in China" blogs had scraped my content.
Then, I have images that are often grabbed. I watch image search traffic and watch for them.
I have tens of thousands of pages on the web. Its hard to monitor all of them, but it is easy to monitor when you can download a traffic spreadsheet that has % up and % down, sort it and then investigate. So, I am being responsive instead of proactive. And, really, I don't look at it as ROI, it is loss prevention.
-
Thanks for the detailed suggestions!
As a follow up: what metric do you use to decide which offenders to go after, and which ones to ignore? I simply don't have time to go after everybody who has copied my content so I need a way to prioritize.
There are two obvious situations where action is warranted: first, when the infringement is committed by a competitor in my industry, and second, when the infringing content outperforms my own site in the SERPs. What else would you suggest?
Thanks again.
-
Over the years, tons of original content from my website (written by me) has been scraped by 200-300 external sites.
I have the same problem on multiple sites. Most of the time the scraping is not harmful. But, on several occasions it has cost me thousands of dollars and forced me to abandon product lines and donate thousands of dollars worth of inventory to Goodwill. Infringers have included websites of many law firms, a state supreme court. a presidential candidate, an Ivy League law school and many others. Infringers can be using images, video or text.
It is EXTREMELY time consuming to identify the site owners, prepare an email with supporting evidence (screen shots), and following up 2, 3, 15 times until they remove the scraped content. Filing DMCA takedowns are a final option for sites hosted in the US,....
I am not an expert in intellectual property law, so what I do or say is not advice. Filing a DMCA can get you sued even if you are in the right. If you file a DMCA all of the details including your name and why you filed will be easily available to the person or company that you complained about. They can retaliate against you, call begging you to retract the DMCA, they can do anything they want against you.
If I contact someone two or three times without results I go straight to DMCA. One thing that I can say about Google is that they generally respond promptly about removing infringing content from their web SERPs and image SERPs. They also generally respond promptly to infringing content on Blogspot and YouTube. Ebay will shut down auctions en masse in response to a DMCA if a seller or group of sellers are using your images or other property.
When infringing content is on a university, government agency, or prominent company's website they usually respond immediately to notification. I usually contact a provost, legal department, or internal manager instead of writing to "webmaster" - who probably was involved in the problem and simply does not understand intellectual property. I usually don't prepare a big document. An email pointing out the infringing work and offering a resolution of "take it down right away" will usually get fast results.
quite a few of the offenders are in China, India, Nigeria, and other places not subject to DMCA.
If you can't identify the owner of the website or if they are outside of the USA, you can still file a DMCA to have the content removed from search engines or websites like YouTube or Blogspot who have an international user community but are owned by a US company. Some of them will insist that you deal with their infringing member, having an attorney contact them might yield quick results.
A lot of the professional spam is done from outside of the USA but there are a few spammers and simply arrogant cowboys in the USA. DMCA is the route to take, but you do risk retaliation with some of them.
Sometimes, when a site owner takes down scraped content, it reappears a few months or years later. It's exasperating.
Yep.
I spend a good amount of time protecting my content. The problem is so big that I can usually only afford to do it in situations where the scraping, infringing or whatever is costing me or my content is appearing on the website of an established business or organization who should have people in leadership positions who would not want that happening.
I watch my analytics watching for traffic drops, etc. Occasionally I go out looking for infringement. The cost of policing can be astronomical. I could have a full time employee working on this if I was going after everyone - and its not cost effective. Most of the people who are grabbing your stuff are putting it on domains that can't damage your rankings.
A greater problem than verbatim theft, in my opinion, is the people who grab your articles and simply rewrite them. You spent tons of time doing the research and preparing the presentation. They simply do a paragraph-by-paragraph rewrite into something that is not detectable or recognizable beyond structure.
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Penalties for duplicate content
Hello!We have a website with various city tours and activities listed on a single page (http://vaiduokliai.lt/). The list changes accordingly depending on filtering (birthday in Vilnius, bachelor party in Kaunas, etc.). The URL doesn't change. Content changes dynamically. We need to make URL visible for each category, then optimize it for different keywords (for example city tours in Vilnius for a list of tours and activities in Vilnius with appropriate URL /tours-in-Vilnius).The problem is that activities overlap very often in different categories, so there will be a lot of duplicate content on different pages. In such case, how severe penalty could be for duplicate content?
Intermediate & Advanced SEO | | jpuzakov0 -
Blog content and panda?
If we want to create a blog to keep in front of our customers (via email and posting on social) and the posts will be around 300 - 1000 words like this site http://www.solopress.com/blog/ are we going to be asking for a panda slap as the issue would be the very little shares and traction after the first day or two. Also would panda only affect the blogs that are crap if we mix in a couple of really good posts or would it affect theses as well and possibly even the site? Any help would be appreciated.
Intermediate & Advanced SEO | | BobAnderson0 -
Duplicate Content Pages - A Few Queries..
I am working through the latest Moz Crawl Report and focusing on the 'high priority' issues of Duplicate Page Content. There are some strange instances being flagged and so wondered whether anyone has any knowledge as to why this may be happening... Here is an example; This page; http://www.bolsovercruiseclub.com/destinations/cruise-breaks-&-british-isles/bruges/ ...is apparently duplicated with these pages; http://www.bolsovercruiseclub.com/guides/excursions http://www.bolsovercruiseclub.com/guides/cruises-from-the-uk http://www.bolsovercruiseclub.com/cruise-deals/norwegian-star-europe-cruise-deals Not sure why...? Also, pages that are on our 'Cruise Reviews' section such as this page; http://www.bolsovercruiseclub.com/cruise-reviews/p&o-cruises/adonia/cruising/931 ...are being flagged as duplicated content with a page like this; http://www.bolsovercruiseclub.com/destinations/cruise-breaks-&-british-isles/bilbao/ Is this a 'thin content' issue i.e. 2 pages have 'thin content' and are therefore duplicated? If so, the 'destinations' page can (and will be) rewritten with more content (and images) but the 'cruise reviews' are written by customers and so we are unable to do anything there... Hope that all makes sense?! Andy
Intermediate & Advanced SEO | | TomKing0 -
How to Fix Duplicate Page Content?
Our latest SEOmoz crawl reports 1138 instances of "duplicate page content." I have long been aware that our duplicate page content is likely a major reason Google has de-valued our Web store. Our duplicate page content is the result of the following: 1. We sell audio books and use the publisher's description (narrative) of the title. Google is likely recognizing the publisher as the owner / author of the description and our description as duplicate content. 2. Many audio book titles are published in more than one format (abridged, unabridged CD, and/or unabridged MP3) by the same publisher so the basic description on our site would be the same at our Web store for each format = more duplicate content at our Web store. Here's are two examples (one abridged, one unabridged) of one title at our Web store. Kill Shot - abridged Kill Shot - unabridged How much would the body content of one of the above pages have to change so that a SEOmoz crawl does NOT say the content is duplicate?
Intermediate & Advanced SEO | | lbohen0 -
How quickly should I publish a massive backlog of content?
Hello experts! I have a query about publishing a backlog of content. Run a quote requesting website for design. When we first built it, I was not well versed in SEO. However today I know a whole lot more, thanks to SEOmoz mostly. For two years customers have been requesting quotes which are then given to registered designers. The brief provided by the customers is locked away behind a private log in area for designers. There is a ton of unique content there that can't be indexed by Google. Here is my idea: 1. Register a new domain, something like, designjobs.com.au 2. Use Wordpress to publish the briefs submitted by clients. 3. Link each brief to our main website for SEO benefits. However we have over 1000 quote requests dating back over two years. If I published this all at once would Google treat is as suspicious? If so, should I alter the dates and have them published one at a time?
Intermediate & Advanced SEO | | designquotes0 -
Duplicate Content Question
My client's website is for an organization that is part of a larger organization - which has it's own website. We were given permission to use content from the larger organization's site on my client's redesigned site. The SEs will deem this as duplicate content, right? I can "re-write" the content for the new site, but it will still be closely based on the original content from the larger organization's site, due to the scientific/medical nature of the subject material. Is there a way around this dilemma so I do not get penalized? Thanks!
Intermediate & Advanced SEO | | Mills1 -
Duplicate content question? thanks
Hi, Im my time as an SEO I have never come across the following two scenarios, I am an advocate of using unique content, therefore always suggest and in cases demand that all content is written or re-written. This is the scenarios I am facing right now. For Example we have www.abc.com (has over 200 original recipes) and then we have www.xyz.com with the recipes but they are translated into another language as they are targeting different audiences, will Google penalize for duplicate content? The other issue is that the client got the recipes from www.abc.com (that have been translated) and use them in www.xyz.com aswell, both sites owned by the same company so its not pleagurism they have legal rights but I am not sure how Google will see it and if it will penalize the sites. Thanks!
Intermediate & Advanced SEO | | M_81 -
Duplicate Content across 4 domains
I am working on a new project where the client has 5 domains each with identical website content. There is no rel=canonical. There is a great variation in the number of pages in the index for each of the domains (from 1 to 1250). OSE shows a range of linking domains from 1 to 120 for each domain. I will be strongly recommending to the client to focus on one website and 301 everything from the other domains. I would recommend focusing on the domain that has the most pages indexed and the most referring domains but I've noticed the client has started using one of the other domains in their offline promotional activity and it is now their preferred domain. What are your thoughts on this situation? Would it be better to 301 to the client's preferred domain (and lose a level of ranking power throught the 301 reduction factor + wait for other pages to get indexed) or stick with the highest ranking/most linked domain even though it doesn't match the client's preferred domain used for email addresses etc. Or would it better to use cross-domain canoncial tags? Thanks
Intermediate & Advanced SEO | | bjalc20110