What tools do you use to find scraped content?
-
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content?
Looking forward to your suggestions.
Vic
-
Oh, this belongs to a different thread: http://moz.com/community/q/chinese-site-ranking-for-our-brand-name-possible-hack
-
Is this part of the original conversation, or something else? Which sites are these?
-
I'm not sure we have been scraped as such though, because the site in question has different content.
It looks as though the offending site has hacked another site (which redirects to the offending site) but the hacked site is ranking for our brand name. Our homepage has lost all rankings it had (our category and product pages seem fine) and has essentially disappeared.
Can anyone else shed any light?
-
Siteliner (Copyscape's big brother) is really great and what we use first (plus I have a bookmarklet for it to make it faster & easy to use.)
Also use Linda's method of taking a bit of content in quotes. Easiest way to show an ecommerce client how much work they're going to require - take three product descriptions into Google, watch the magic, and explain that would happen across all 15,000 products.
-
I spot check on a regular basis by taking a unique chunk out of a post, putting it in quotes, and doing a Google search on it. It's not comprehensive, but it is free. [And the main problems we have had with scrapers have been with sites that have taken huge portions of our content, not just an article or two, and a spot check roots those out.]
-
Thanks, Chris & Jonathan. I will look into Copyscape. Good stuff!
-
Yep, Copyscape is what I use. I use a wordpress plugin that uses the copyscape API and just check my main content every month or so with a simple click.
-
Copyscape works well for us. You can scan a couple of pages for free, and then it's $0.05/page after that.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing duplicated content using only the NOINDEX in large scale (80% of the website).
Hi everyone, I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content. However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user. What do you think about this "theory"? What would you do? Thank you for your help!
White Hat / Black Hat SEO | | Lukas_TheCurious0 -
Common passwords used for spam accounts?
This is a bit of a longshot. I know that many of the spam forum accounts, blog posts etc that have in the past been used for SEO are generated automatically. Does anyone know of any common passwords that are often used when setting up these accounts? I only ask as, trying to clean up the backlink profile for my website, I found myself in desperation keying in random passwords trying to access the spam accounts created on various forums by our former SEO agency. Eventually I got lucky and worked out the password for a series of forum accounts was, not very imaginatively, 'seo'. Having worked out this, I was able to delete the spam signatures on about 10 forums. But there are many other accounts where I have no idea of the password used. I guess I'm just wondering if there are standard stock passwords used in the past by many SEOs? Not likely to get an answer to this one, I know, but worth a shot.
White Hat / Black Hat SEO | | mgane0 -
Image Optimization & Duplicate Content Issues
Hello Everyone, I have a new site that we're building which will incorporate some product thumbnail images cut and pasted from other sites and I would like some advice on how to properly manage those images on our site. Here's one sample scenario from the new website: We're building furniture and the client has the option of selecting 50 plastic laminate finish options from the Formica company. We'll cut and paste those 50 thumbnails of the various plastic laminate finishes and incorporate them into our site. Rather than sending our website visitors over to the Formica site, we want them to stay put on our site, and select the finishes from our pages. The borrowed thumbnail images will not represent the majority of the site's content and we have plenty of our own images and original content. As it does not make sense for us to order 50 samples from Formica & photograph them ourselves, what is the best way to handle to issue? Thanks in advance, Scott
White Hat / Black Hat SEO | | ccbamatx0 -
How do I make a content calendar to increase my rank for a key word?
I've watched more than a few seminars on having a content calendar. Now I'm curious as to what I would need to do to increase ranking for a specific keyword in local SEO. Let's say I wanted to help them increase their rank for used trucks in buffalo, NY. Would I regularly publish blog posts about used trucks? Thanks!
White Hat / Black Hat SEO | | oomdomarketing0 -
Do I need to use meta noindex for my new website before migration?
I just want to know your thoughts if it is necessary to add meta noindex nofollow tag in each page of my new website before migrating the old pages to new pages under a new domain? Would it be better if I'll just add a blockage in my robots.txt then remove it once we launch the new website? Thanks!
White Hat / Black Hat SEO | | esiow20130 -
Negative SEO and when to use to Dissavow tool?
Hi guys I was hoping someone could help me on a problem that has arisen on the site I look after. This is my first SEO job and I’ve had it about 6 months now. I think I’ve been doing the right things so far building quality links from reputable sites with good DA and working with bloggers to push our products as well as only signing up to directories in our niche. So our backlink profile is very specific with few spammy links. Over the last week however we have received a huge increase in backlinks which has almost doubled our linking domains total. I’ve checked the links out from webmaster tools and they are mainly directories or webstat websites like the ones below | siteinfo.org.uk deperu.com alestat.com domaintools.com detroitwebdirectory.com ukdata.com stuffgate.com | We’ve also just launched a new initiative where we will be producing totally new and good quality content 4-5 times a week and many of these new links are pointing to that page which looks very suspicious to me. Does this look like negative Seo to anyone? I’ve read a lot about the disavow tool and it seems people’s opinions are split on when to use it so I was wondering if anyone had any advice on whether to use it or not? It’s easy for me to identify what these new links are, yet some of them have decent DA so will they do any harm anyway? I’ve also checked the referring anchors on Ahrefs and now over 50% of my anchor term cloud are totally unrelated terms to my site and this has happened over the last week which also worries me. I haven’t seen any negative impact on rankings yet but if this carries on it will destroy my link profile. So would it be wise to disavow all these links as they come through or wait to see if they actually have an impact? It should be obvious to Google that there has been a huge spike in links so then the question is would they be ignored or will I be penalised. Any ideas? Thanks in advance Richard
White Hat / Black Hat SEO | | Rich_9950 -
Is it still valuable to place content in subdirectories to represent hierarchy or is it better to have every URL off the root?
Is it still valuable to place content in subdirectories to represent hierarchy on the site or is it better to have every URL off the root? I have seen websites structured both ways. It seems having everything off the root would dilute the value associated with pages closest to the homepage. Also, from a user perspective, I see the value in a visual hierarchy in the URL.
White Hat / Black Hat SEO | | belcaro19860 -
Duplicate content or not? If you're using abstracts from external sources you link to
I was wondering if a page (a blog post, for example) that offers links to external web pages along with abstracts from these pages would be considered duplicate content page and therefore penalized by Google. For example, I have a page that has very little original content (just two or three sentences that summarize or sometimes frame the topic) followed by five references to different external sources. Each reference contains a title, which is a link, and a short abstract, which basically is the first few sentences copied from the page it links to. So, except from a few sentences in the beginning everything is copied from other pages. Such a page would be very helpful for people interested in the topic as the sources it links to had been analyzed before, handpicked and were placed there to enhance user experience. But will this format be considered duplicate or near-duplicate content?
White Hat / Black Hat SEO | | romanbond0