What tools do you use to find scraped content?
-
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content?
Looking forward to your suggestions.
Vic
-
Oh, this belongs to a different thread: http://moz.com/community/q/chinese-site-ranking-for-our-brand-name-possible-hack
-
Is this part of the original conversation, or something else? Which sites are these?
-
I'm not sure we have been scraped as such though, because the site in question has different content.
It looks as though the offending site has hacked another site (which redirects to the offending site) but the hacked site is ranking for our brand name. Our homepage has lost all rankings it had (our category and product pages seem fine) and has essentially disappeared.
Can anyone else shed any light?
-
Siteliner (Copyscape's big brother) is really great and what we use first (plus I have a bookmarklet for it to make it faster & easy to use.)
Also use Linda's method of taking a bit of content in quotes. Easiest way to show an ecommerce client how much work they're going to require - take three product descriptions into Google, watch the magic, and explain that would happen across all 15,000 products.
-
I spot check on a regular basis by taking a unique chunk out of a post, putting it in quotes, and doing a Google search on it. It's not comprehensive, but it is free. [And the main problems we have had with scrapers have been with sites that have taken huge portions of our content, not just an article or two, and a spot check roots those out.]
-
Thanks, Chris & Jonathan. I will look into Copyscape. Good stuff!
-
Yep, Copyscape is what I use. I use a wordpress plugin that uses the copyscape API and just check my main content every month or so with a simple click.
-
Copyscape works well for us. You can scan a couple of pages for free, and then it's $0.05/page after that.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Scraping Website and Using Our Clients Info
One of our clients on Moz has noticed that another website has been scraping their website and pulling lots of their content without permission. We would like to notify Google about this company but are not sure if that is the right remedy to correct the problem. They appear in search results on Google using the client's name so they seem to be use page titles etc with the client's name in them. Several of the SERP links link to their own website but it pulls in our client's web page. Was hoping anyone could perhaps provide some additional options on how to attack this problem?
White Hat / Black Hat SEO | | InTouchMK0 -
Duplicate content warning: Same page but different urls???
Hi guys i have a friend of mine who has a site i noticed once tested with moz that there are 80 duplicate content warnings, for instance Page 1 is http://yourdigitalfile.com/signing-documents.html the warning page is http://www.yourdigitalfile.com/signing-documents.html another example Page 1 http://www.yourdigitalfile.com/ same second page http://yourdigitalfile.com i noticed that the whole website is like the nealry every page has another version in a different url?, any ideas why they dev would do this, also the pages that have received the warnings are not redirected to the newer pages you can go to either one??? thanks very much
White Hat / Black Hat SEO | | ydf0 -
Separating the syndicated content because of Google News
Dear MozPeople, I am just working on rebuilding a structure of the "news" website. For some reasons, we need to keep syndicated content on the site. But at the same time, we would like to apply for google news again (we have been accepted in the past but got kicked out because of the duplicate content). So I am facing the challenge of separating the Original content from Syndicated as requested by google. But I am not sure which one is better: *A) Put all syndicated content into "/syndicated/" and then Disallow /syndicated/ in robots.txt and set NOINDEX meta on every page. **But in this case, I am not sure, what will happen if we will link to these articles from the other parts of the website. We will waste our link juice, right? Also, google will not crawl these pages, so he will not know about no indexing. Is this OK for google and google news? **B) NOINDEX meta on every page. **Google will crawl these pages, but will not show them in the results. We will still loose our link juice from links pointing to these pages, right? So ... is there any difference? And we should try to put "nofollow" attribute to all the links pointing to the syndicated pages, right? Is there anything else important? This is the first time I am making this kind of "hack" so I am exactly sure what to do and how to proceed. Thank you!
White Hat / Black Hat SEO | | Lukas_TheCurious1 -
Do I lose link juice if I have a https site and someone links to me using http instead?
We have recently launched a https site which is getting some organic links some of which are using https and some are using http. Am I losing link juice on the ones linked using http even though I am redirecting or does Google view them the same way? As most people still use http naturally will it look strange to google if I contact anyone who has given us a link and ask them to change to https?
White Hat / Black Hat SEO | | Lisa-Devins0 -
How does Google decide what content is "similar" or "duplicate"?
Hello all, I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this: http://www.eteach.com/Employer.aspx?EmpNo=26626 http://www.eteach.com/Employer.aspx?EmpNo=36986 and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them. But my question is... If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages? e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards. Something like that... Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
White Hat / Black Hat SEO | | Eteach_Marketing0 -
Duplicate Content
Hi, I have a website with over 500 pages. The website is a home service website that services clients in different areas of the UK. My question is, am I able to take down the pages from my URL, leave them down for say a week, so when Google bots crawl the pages, they do not exist. Can I then re upload them to a different website URL, and then Google wont penalise me for duplicate content? I know I would of lost juice and page rank, but that doesnt really matter, because the site had taken a knock since the Google update. Thanks for your help. Chris,
White Hat / Black Hat SEO | | chrisellett0 -
"take care about the content" is it always true?
Hi everyone, I keep reading answer ,in reference to ranking advice, in wich the verdict is always the same: "TAKE CARE ABOUT THE CONTENT INSTEAD OF PR", and phrases like " you don't have to waste your time buying links, you have first of all to engage your visitors. ideally it works but not when you have to deal with small sites and especially when you are going to be ranked for those keywords where there's not too much to write. i'll give you an example still unsolved: i've got a client who just want to be ranked first for his flagship store, now his site is on the fourth position and the first ranked is a site with no content and low authority but it has the excact keyword match domain. tell me!!! what kind of content should i produce in order to be ranked for the name of the shop and the city?? the only way is to get links.... or to stay forth..... if you would like to help me, see more details below: page: http://poltronafraubrescia.zenucchi.it keyword: poltrona frau brescia competitor ranked first: http://turra.poltronafraubrescia.it/ competiror ranked second: http:// poltronafraubrescia.com/
White Hat / Black Hat SEO | | guidoboem0 -
Content box (on page content) and titles Google over-optimization penalty?
We have a content box at the bottom of our website with a scroll bar and have posted a fair bit of content into this area (too much for on page) granted it is a combination of SEO content (with links to our pages) and informative but with the over optimization penalty coming around I am a little scared if this will result in a problem for us. I am thinking of adopting the process of this website HERE with the content behind a more information button that drops down, would this be better as it could be much more organised and we will be swopping out to more helpful information than the current 50/50 (SEO – helpful content) or will it be viewed the same and we might as well leave it as is and lower the amount of repetition and links in the content. Also we sell printed goods so our titles may be a bit over the top but they are bring us a lot of converting traffic but again I am worried about the new Google release this is an example of a typical title (only an example not our product page) Banner Printing | PVC Banners | Outdoor Banners | Backdrops | Vinyl Banners | Banner Signs Thank you for any help with these matters.
White Hat / Black Hat SEO | | BobAnderson0