Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best Site Architecture tool for analysis my & competitors site?
Hello All, I am really confused with my current architecture for Ecommerce site, can you please suggest any tool or software where I can analysis mine and competitors site architecture ? Thanks!
On-Page Optimization | | pragnesh96390 -
Duplication in landing page
This is driving me mad, I have a site that for some reason google and moz pick up the landing page as a duplicate. They see "mysite/" and "mysite/index.html" as two different pages and giving me warnings for duplication. I have no 301 included at this time and I am using foundation as the base. This is occurring both on a localhost test bed and live....... anyone got an idea how to correct.
On-Page Optimization | | AndyBirtles0 -
How to use canonical with mobile site to main site
I am pretty sure that the mobile version of the main site needs to be the same canonical link from what I understand. I am trying to find good docuementation that supports this. Even better if its from Google or Matt Cutts. I have a main domain like http://www.mydomain.com the mobile version of this is http://www.mydomain.com/m/ Should my canonical be rel="canonical" href="http://www.mydomain.com"/> for both these pages?
On-Page Optimization | | cbielich0 -
Duplication issue on my website
hi I have a cms website with 2000 pages.my problem is that 1. www.test.com/abc.html 2. www.test.com/abc.html?gallery?123testing it showing duplication page in me seomoz error list. It is a single page. Please suggest solution for it
On-Page Optimization | | wmsindia0 -
Duplicate content because of content scrapping - please help
We manage brands websites in a very competitive industry that have thousands of affiliate links We see that more and more websites (mainly affiliates websites) are scrapping our brand websites content and it generate many duplicate content (but most of them link to us back with an affiliate link). Our brand websites still rank for any sentence in brackets you search in Google, Will this duplicate content hurt our brand websites ? If yes, should we take some preventive actions ? We are not able to add ongoing UGC or additional text to all our duplicate content and trying to stop those websites of stealing our content is like playing cat and mouse... Thanks for your advices
On-Page Optimization | | Tit0 -
Duplicate Page Content and Duplicate Page Title
Hi All, I'm new in SEOMoz and have some questions after I have already spend 2-3 days trying to resolve the problems identified from Crawling one of my clients websites. I get quite a lot of Duplicate Page Conntent and Page Titles warnings and trying to find a workaround through the forums and posts. I continuously get this error on most of my pages: URL: http://domain.com/benefits with the same Page but with a WWW in front URL: http://www.domain.com/benefits Any advice will be highly appreciated. Thanks, Athos
On-Page Optimization | | athosk0 -
Articles on our own site?
Hi Guy's Is it better to have articles on our own website under a news section, with keyword phrases that point to relevant pages or is it better to post them on our blog (that is seperate, we use blogger) and point them to relevant pages on our site? or maybe a bit of both? Thanks Daniel
On-Page Optimization | | LushDuck0 -
Magento Layered Navigation & Duplicate Content
Hello Dear SeoMoz, I would like to ask your help with something that I am not sure off. Our ecommerce web site is built with Magento. I have found many problems so far and I know that there will be many more in the future. Currently, I am trying to find the best way to deal with the duplicate content that is produced from the layered navigation (size, gender etc). I have done a lot of research so far in order to understand which might be the best practice and I found the following practices: **Block layered navigation URLSs from the Google Webmaster Tools (**Apparently this works for Google Only). Block these URLs with the robots.txt file Make links no-follow **Make links JavaScript from Magento *** Avoid including these links in the xml site map. Avoid including these link in the A-Z Product Index. Canonical tag Meta Tags (noindex, nofollow) Question If I turn the layered navigation links into JavaScript links from the Magento Admin, the layered navigation links are still found by the crawlers but they look like that: | http://www.mysite.com/# instead of: http://www.mysite.com/girls-basics.html?gender_filte... | Can these new URLS (http://www.mysite.com/# ) solve the duplicate content problems with the layered navigation or do I need to implement other practices too to make sure that everything is done right. Kind Regards Stefanos Anastasiadis
On-Page Optimization | | alexandalexaseo0