What tools do you use to find scraped content?
-
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content?
Looking forward to your suggestions.
Vic
-
Oh, this belongs to a different thread: http://moz.com/community/q/chinese-site-ranking-for-our-brand-name-possible-hack
-
Is this part of the original conversation, or something else? Which sites are these?
-
I'm not sure we have been scraped as such though, because the site in question has different content.
It looks as though the offending site has hacked another site (which redirects to the offending site) but the hacked site is ranking for our brand name. Our homepage has lost all rankings it had (our category and product pages seem fine) and has essentially disappeared.
Can anyone else shed any light?
-
Siteliner (Copyscape's big brother) is really great and what we use first (plus I have a bookmarklet for it to make it faster & easy to use.)
Also use Linda's method of taking a bit of content in quotes. Easiest way to show an ecommerce client how much work they're going to require - take three product descriptions into Google, watch the magic, and explain that would happen across all 15,000 products.
-
I spot check on a regular basis by taking a unique chunk out of a post, putting it in quotes, and doing a Google search on it. It's not comprehensive, but it is free. [And the main problems we have had with scrapers have been with sites that have taken huge portions of our content, not just an article or two, and a spot check roots those out.]
-
Thanks, Chris & Jonathan. I will look into Copyscape. Good stuff!
-
Yep, Copyscape is what I use. I use a wordpress plugin that uses the copyscape API and just check my main content every month or so with a simple click.
-
Copyscape works well for us. You can scan a couple of pages for free, and then it's $0.05/page after that.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are online tools considered thin content?
My website has a number of simple converters. For example, this one converts spaces to commas
White Hat / Black Hat SEO | | ConvertTown
https://convert.town/replace-spaces-with-commas Now, obviously there are loads of different variations I could create of this:
Replace spaces with semicolons
Replace semicolons with tabs
Replace fullstops with commas Similarly with files:
JSON to XML
XML to PDF
JPG to PNG
JPG to TIF
JPG to PDF
(and thousands more) If somoene types one of those into Google, they will be happy because they can immediately use the tool they were hunting for. It is obvious what these pages do so I do not want to clutter the page up with unnecessary content. However, would these be considered doorway pages or thin content or would it be acceptable (from an SEO perspective) to generate 1000s of pages based on all the permutations?1 -
Our webpage has been embedded in others website using iframe. Does this hurts us in rankings?
Hi all, One of our partners have embedded our webpage in their website using iframe such a way that anybody can browse to any page of our website from their website. It's said that content in iframes will not get indexed. But when I Google for h1 tag of our embedded webpage, I can see their website in SERP but not ours which is original. How it's been indexed? Is this hurting us in rankings? Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Does google give any advantage to Webmaster tools verified sites?
Hello friends, I am seeing a strange pattern. i register 2 new domain and make sites on them and add no backlinks nothing only put content and did on page seo right. After 1month of google indexing. both sites are not showing in search for the targeted keywords, but as soon as i add them to Google Webmaster tools they both automatically comes to the 16th and 24th number for their specific keywords. So my question is does Google give any advantage to sites which are verified and added into its webmaster tools in terms of seo or authority?
White Hat / Black Hat SEO | | RizwanAkbar0 -
Dynamic Content Boxes: how to use them without get Duplicate Content Penalty?
Hi everybody, I am starting a project with a travelling website which has some standard category pages like Last Minute, Offers, Destinations, Vacations, Fly + Hotel. Every category has inside a lot of destinations with relative landing pages which will be like: Last Minute New York, Last Minute Paris, Offers New York, Offers Paris, etc. My question is: I am trying to simplify my job thinking about writing some dynamic content boxes for Last Minute, Offers and the other categories, changing only the destination city (Rome, Paris, New York, etc) repeated X types in X different combinations inside the content box. In this way I would simplify a lot my content writing for the principal generic landing pages of each category but I'm worried about getting penalized for Duplicate Content. Do you think my solution could work? If not, what is your suggestion? Is there a rule for categorize a content as duplicate (for example number of same words in a row, ...)? Thanks in advance for your help! A.
White Hat / Black Hat SEO | | OptimizedGroup0 -
Its posible to use Google Authorship in an online shop?
Today I installed Google Authorship in my Wordpress Blog and I would like to know if its posible to implement it in my Opencart online shop. I am not interested in rich snippets because I have 9k of products and the 90% of them dont have sells nor reviews
White Hat / Black Hat SEO | | mozismoz0 -
Using a geolocation service to serve different banners in homepage. Dangers? Best Practices?
Hello, our website is used by customer in more than 100 countries. Becasuse the countries we serve are so many, we are using one single domain and homepage, without country specific content. Now, we are considering to use an geolocation service to identify the customer location and then to change the contents of one banner in the home page accordingly. Might this be dangerous from a SEO perspective? If yes, any suggesiton on how can we implement this to avoid troubles and penalties form the Search Engines? Thanks in advance for any help,Dario
White Hat / Black Hat SEO | | Darioz0 -
How does Google decide what content is "similar" or "duplicate"?
Hello all, I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this: http://www.eteach.com/Employer.aspx?EmpNo=26626 http://www.eteach.com/Employer.aspx?EmpNo=36986 and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them. But my question is... If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages? e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards. Something like that... Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
White Hat / Black Hat SEO | | Eteach_Marketing0 -
Advice on using the disavow tool to remove hacked website links
Hey Everyone, Back in December, our website suffered an attack which created links to other hacked webistes which anchor text such as "This is an excellent time to discuss symptoms, fa" "Open to members of the nursing/paramedical profes" "The organs in the female reproductive system incl" The links were only visible when looking at the Cache of the page. We got these links removed and removed all traces of the attack such as pages which were created in their own directory on our server 3 months later I'm finding websites linking to us with similar anchor text to the ones above, however they're linking to the pages that were created on our server when we were attacked and they've been removed. So one of my questions is does this effect our site? We've seen some of our best performing keywords drop over the last few months and I have a feeling it's due to these spammy links. Here's a website that links to us <colgroup><col width="751"></colgroup>
White Hat / Black Hat SEO | | blagger
| http://www.fashion-game.com/extreme/blog/page-9 | If you do view source or look at the cached version then you'll find a link right at the bottom left corner. We have 268 of these links from 200 domains. Contacting these sites to have these links removed would be a very long process as most of them probably have no idea that those links even exist and I don't have the time to explain to each one how to remove the hacked files etc. I've been looking at using the Google Disavow tool to solve this problem but I'm not sure if it's a good idea or not. We haven't had any warnings from Google about our site being spam or having too many spam links, so do we need to use the tool? Any advice would be very much appreciated. Let me know if you require more details about our problem. <colgroup><col width="355"></colgroup>
| | | |0