Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Online classified ads site - duplicate content?
Hello, I was reading hobo s post on duplicate content. Our web is in the classified advertisement industry and our site is built up like this Homepage (last 200 ads) category 1(has the name we want to rank our homepage and around 350 ads) category 2 (around 100 ads) category 3 (around 60 ads) Now our homepage has 200 ads that also appear mostly in category 1 but also in others. We are ranking our homepage as 11 th now on Google. I'm worried a bit that the 200 ads on the homepage are not unique, because they will appear in one other category. Is this OK? Is this duplication? Should we do something? Issue is that we at first started ranking our homepage where all ads were, now there are too many so we show 200 latest on homepage and then they are split into category pages.
On-Page Optimization | | advertisingcloud0 -
Do permanent redirect solve the issue of duplicate content?
Hi, I have a product page on my site as below. www.mysite.com/Main-category/SubCatagory/product-page.html This page was accessible in both ways as below. 1. www.mysite.com/Main-category/SubCatagory/product-page.html 2. www.mysite.com/Main-category/product-page.html This was causing duplicate title issue. So i permanently redirected one to other. But after more than a month and after many crawls, webmaster tools html improvement still shows duplicate title issue. My question is that do permanent redirect solve duplicate content issue or something i am missing here?
On-Page Optimization | | Kashif-Amin0 -
Duplicate Content, http vs https
Hi All! I just discovered that a client of ours a duplicate content issue. Essentially they have approximately 20 pages that have an http and an https version. Is there a better way to handle this than a simple 301? Regards, Frank
On-Page Optimization | | FrankSweeney0 -
Stolen Content reposted on other sites. How does this affect ranking?
Visitors often copy and paste my content and post it elsewhere... on Facebook, on Tumblr, on forums and sometimes on competing websites... but they don't link to me. How does Google treat this duplicated content? What is the best way to handle it? File DCMA claims or ask them for a link?
On-Page Optimization | | brianflannery0 -
Duplicate content issue, across site domains (blogging)
Hi all, I've just come to learn that a client has been cross-posting their blog posts to other blogs (on higher quality domains, in some cases). For example - this is the same post on 3 different blogs. http://thebioethicsprogram.wordpress.com/2014/06/30/how-an-irb-could-have-legitimately-approved-the-facebook-experiment-and-why-that-may-be-a-good-thing/
On-Page Optimization | | ketanmv
http://blogs.law.harvard.edu/billofhealth/2014/06/29/how-an-irb-could-have-legitimately-approved-the-facebook-experiment-and-why-that-may-be-a-good-thing/
http://www.thefacultylounge.org/2014/06/how-an-irb-could-have-legitimately-approved-the-facebook-experimentand-why-that-may-be-a-good-thing.html
And, sometimes a 4th time, on an NPR website. I'm assuming this is doing no one any favors and Harvard or NPR is going to earn the rank most every time. I'm going to encourage them to publish only fresh content on their real blog, would you agree? Can this actually harm the ranking of their blog and website - should we delete the old entries when migrating the blog? They are going to move their Wordpress Blog to hosting on their real domain soon:
http://www.bioethics.uniongraduatecollege.edu/news/ The current set up is not adding any value to their domain. Thank you for any advice! Ketan0 -
Strategies for revising my duplicate content?
New to SEO and SEOmoz. I tried searching for this first and I'm sure it's on here but I could not find it. I have a site that markets fishing charters in a few dozen cities. Up to now I was relying on PPC and using each city page as a landing page of sorts. Each citiy page is very similar (there are only so many ways to write about a type of fish or fishing). What would be the recommended way for optimizing this, keeping in mind the duplicate information we provide on each page seems to be important to people. Site is www.vipfishingcharters.com Thanks!
On-Page Optimization | | NoahC0 -
Do videos count as duplicate content?
If we allow users to embed our videos on their site, would that count as duplicate content? I imagine note, given that Google can't usually 'see' the content of videos, but just want to double check.
On-Page Optimization | | nicole.healthline0 -
Checking Duplicate Content
Hi there, We are migrating to a new website, which we are writing lots of new content for the new website. The new website is hosted on a development site which is password protected and so on so that it cannot be indexed. What i would like to know is, how do i check for duplicate content issues out there on the world wide web with the dev site being password protected? Hope this makes sense. Kind Regards,
On-Page Optimization | | Paul780