Remove Scraped Content?
-
There is a site I work for that has content that, when you search in Google a snippet of text from, they are not the top result for. I believe what has happened is that they had written blogs and articles and added them to their site and article directories at the same time and the article directories got cached first.
If we're not coming up first for our article, that means we are not believed to be the original author, correct?
Should I remove all content from our site where this is happening, even though we actually did create these articles?
-
I explained the answer to this in the second part of my original post.
-
I would hope you had a link, when possible, back to your site. If not, then the page should be dated by creation and last update which Google can see. Although I would not leave anything up to guess work, but make sure you have links, and I would even put the date it was posted onto the post on your site like news article are. Just another indicator.
I would not remove the content if in fact, it did originate from you.
-
Yes, it was intentionally distributed. I would like to know whether the duplicate content on our site is being seen (by Google) as copied, not original, scraped, pulled from another source because we're so lazy we can't come up with any material of our own??
If this is the case, I will be removing the content, as the quality of the content sucks and there is quite a bit of it. Please, do not respond "if the content sucks, then why have it on your site..."
-
The term "scraped content" is most often used for content that has been grabbed from your website by a visiting robot.
Based upon your posting, the duplicate content that you are talking about was intentionally distributed.
-
Then how do you determine if Google is seeing content as scraped? As you know, Google has made it very clear recently how they feel about scraped content.
-
If we're not coming up first for our article, that means we are not believed to be the original author, correct?
Search engines can not identify original authors. (unless you use the rel="author" attribute and then they are merely taking your word for it) They only know which page with the content was discovered first. The content could have been on other pages first or the content could have been published first offline. Search engines don't have divine powers
The page that ranks first in the SERPs is the one that has the best combination of relevance, domain authority and other ranking factors. Has nothing to do with authorship.
Should I remove all content from our site where this is happening, even though we actually did create these articles?
I would not do that if the content is valuable for your visitors, has acquired links from other sites or if the content is pulling traffic from search.
The take-away from this is not to give your content away if you want to rank for it in search. Giving it away can create strong competitors and feed existing competitors.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Different WP Taxonomies seen as duplicate content
Hey guys, We're seeing Moz report "duplicate" content on pages like: mysite.com/interesting-category/ and mysite.com/interesting-tag. Why exactly is this, and is there something that we should do about this? Obviously some of the same posts will intersect on pages like category, tag, author pages. Thanks
Content Development | | andy.bigbangthemes0 -
Tool to identify duplicated content on other sites
Hi does anyone know of a tool that could be used to identify if a site is using our content without permission? Thanks
Content Development | | turismodevino10 -
Stolen Content and a Panda Penalty
Hey Folks Question for those folks that have spent some time helping people with the recent penalties and the like. I have a client who has a clear Panda Penalty, huge drop in traffic on the initial Panda date and a further drop on the second date. Much smaller incremental drops on subsequent recent updates as well. From digging in it seems fairly cut and dry - copyscape shows another 250 or so sites with content from this site and there are nearly 2000 external URLs with duplicate content across these sites. We are talking complete, shameless copies of all of the text, sometimes the images as well. The client claims the content is all 100% unique and is his content and that the other blogs must have stolen his content resulting in the penalty - which, if it is true, and I have no reason to suspect otherwise, kind of sucks. Now, many moons ago, way before Penguin or Panda (maybe around 2006) I had a client that had suddenly lost all traffic and their historical rankings. No funny business, it was a small company, had been online since around 2000 and they were pretty much the first of their kind and always did very well from organic search. As it turned out, the content from the site had not really changed since it was set up and as lots of companies had sprung up offering a similar service they had seen their content copied wholesale, across many sites, all over the world. We attempted to contact many of these sites and got some results but many were just old, abandoned copy cat sites on advert supported hosting that had ceased to trade so we maybe got rid of about 20%. Well, in the end we just decided to rewrite the content, we did this and sure enough, the site bounced back to it's previous standing and has been pretty much there ever since. Now that was kind of easy, the site had maybe 20 pages, and it needed a sprucing up but in this case the site has around 500 pages so doing a rewrite is not going to be so easy. Problem is, I don't see removal requests being particularly successful either. So, I see the options and steps as being. Contact all the sites and request the removal of the content use the Google content removal facility:
Content Development | | Marcus_Miller
https://www.google.com/webmasters/tools/removals File a DMCA takedown for anything remaining Report Scraped Pages to Google:
https://docs.google.com/spreadsheet/viewform?formkey=dGM4TXhIOFd3c1hZR2NHUDN1NmllU0E6MQ&ndplr=1 Submit a spam report for all sites involved ? Submit a reconsideration request to let Google know what we have been doing (unlikely In a nutshell, do everything we can to get this content removed and then documenting this to Google in the hope we catch hold of someone who hears our plight. Interestingly enough, this is a sensitive one, so no URL but I would welcome any thoughts or experiences any of you may have had with similar problems. There is a little extra info here from Matt Cutts + Barry Schwartz that kind of tallies with my approach above but would really like to hear any feedback. http://www.seroundtable.com/google-stolen-content-13243.html Cheers all Marcus0 -
How does one write different pages of their website that are very similar in nature with using too much duplicate content?
We are a service provider and we have different links on our website to each of our services. The problem is the content that we would have for each is very similar. How can I ensure that it is not deemed duplicate content and ranked poorly because of it. Thanks
Content Development | | JayTurner0 -
Syndicating content with rel=author tag in it
If I have an article with my rel=author tag attached to it, and then I syndicate that article to another web site, should I keep the rel=author tag in that synbdicated article? Basically, what I'm worried about is that there will be 2 duplicate articles with my author tag on 2 different web sites. (I intend to put a canonical tag in the syndicated article so there is no duplicate content penalty) What is the best practice for this?
Content Development | | greggseo0 -
How often should content be updated
With all of Google's recent algo updates (or ranking updates, whatever they're calling it now), we've obviously been looking into changing our content strategy and shifting it from quantity to quality. How often would you say is ideal for website content updates? i.e. should we be updating once a month? Once every couple of months? This isn't a blog - just a regular services-oriented site. My take on it is that it should be as often as organically possible - and that means something different for everyone. At the same time, we want Google coming back frequently to crawl the site. Thanks!
Content Development | | eyecarepro0 -
Emailing content to posterous
Posterous is setup to syndicate my content to my personal WP blog. After reading information on SEOmoz, I realize that Duplicate Content is not a good thing. Should I stop this process? Note: The way it works is I email a post to posterous and it posts it there and on my WP Blog and create a FB post and a Tweet.
Content Development | | CMCD0 -
Duplicate content
Hello Seomoz team, i'm french and so my english is not very good ;-). I work for a brand site and we publish content about our products. The problem is : as a brand site, many sites that sell our products, copy our content. And we have duplicate content. And since these sites have worked SEO, they put in place rel canonical tag. as a brand, how to avoid being accused by Google duplicate content? tanks for you answer. I hope it's clear. Take care Denis
Content Development | | android_lyon0