Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content and Other Issues from Blog Tags and Categories
I have recently taken over the maintenance/redesign of our website and after setting up Moz I see many errors:
On-Page Optimization | | jgoethert
Duplicate content
Missing descriptions
Duplicate titles
etc. All are related to blog categories and tags. My questions are: are these errors hurting us? Should I simply remove tags/categories from the sitemaps or bite the bullet and create content for every single category page? Our site is https://financiallysimple.com/ and we are using Yoast plugin in Wordpress (if that helps)2 -
Exclude sorting options using nofollow to reduce duplicate content
I'm getting reports of duplicate content for pages that have different sorting options applied, e.g: /trips/dest/africa-and-middle-east/
On-Page Optimization | | benbrowning
/trips/dest/africa-and-middle-east/?sort=title&direction=asc&page=1
/trips/dest/africa-and-middle-east/?sort=title&direction=des&page=1 I have the added complication of having pagination combined with these sorting options. I also don't have the option of a view all page. I'm considering adding rel="nofollow" to the sorting controls so they are just taken out of the equation, then using rel="next" and rel="prev" to handle the pagination as per Google recommendations(using the default sorting options). Has anyone tried this approach, or have an opinion on whether it would work?0 -
Long list of companies spread out over several pages - duplicate content?
Hi all, I am currently working with a company formation agent. They have a list of every limited company spread over hundreds of pages. What do you guys think? Is there a need for Canonicals? The website is ranking pretty well but I want to make sure there aren't any problems in the future. Here are two pages as examples: http://www.formationsdirect.com/companysearchlist.aspx?start=MULLAGHBOY+CONSTRUCTION+LIMITED&next=1# http://www.formationsdirect.com/companysearchlist.aspx?start=%40a+company+limited&next=1# Also what about the actual company pages? See an example below http://www.formationsdirect.com/companysearchlist.aspx?name=AMNA+CONSTRUCTION+LTD&number=06630333#.U8PW6_ldX1s Thanks in advance Aaron
On-Page Optimization | | AaronGro0 -
Is tracking code added to the end of a URL considered duplicate content
I have two URLs one with a tracking coded and one without. http://www.towermarketing.net/lets-talk-ux-baby and http://www.towermarketing.net/lets-talk-ux-baby/**#.U6ghgLEz64I ** My question is will this be considered as two separate URLs, will Google consider this as two pages with duplicate content. Any recommendations would be much appreciated.
On-Page Optimization | | TowerMarketing0 -
Site Duplicated despte redirect
Buon pormeriggio from I can smell Whaler Chips Through the window Wetherby,
On-Page Optimization | | Nightwing
When you Google Thakray Medical Museum 2 sites appear in the SERPS, yikes! Now the .org site is no longer hosted & point to the .co.uk site when clicked on but in a nutshell I wantto get rid of the .org site
as illustrated here: http://s216.photobucket.com/user/zymurgy_bucket/media/two-versions-same-website-yikes_zps182e6e12.jpg.html Actions taken so far:
1: Wembaster tools re index request for the .co.uk site
2: Redirect configured to point .org site to the .co.uk What else is left apart from updating the xml site but ultimating i do not want to see the the .org site but it doesnt exist (well id did a few month back but is no longer hosted so i am told) Any insights welcome,
GRazie tanto,
David0 -
How to use canonical with mobile site to main site
I am pretty sure that the mobile version of the main site needs to be the same canonical link from what I understand. I am trying to find good docuementation that supports this. Even better if its from Google or Matt Cutts. I have a main domain like http://www.mydomain.com the mobile version of this is http://www.mydomain.com/m/ Should my canonical be rel="canonical" href="http://www.mydomain.com"/> for both these pages?
On-Page Optimization | | cbielich0 -
tagged as duplicate content?
Hello folks, I'm new to SEOmoz . I was looking at our Crawl Diagnostics and found that some of our blog posts that have been commented on were tagged as duplicate content. For example: http://thankyouregistry.com/blog/remarriages-and-gift-registries/ http://thankyouregistry.com/blog/remarriages-and-gift-registries/comment-page-1/ I'm unsure how to fix these, so any ideas would be appreciated. Thanks a lot!
On-Page Optimization | | GiftReg0 -
My site has been dropping, not sure why!
My site has been dropping in the rankings, not sure - my metrics seem better than my competitors. Historically I have been a very stable #2 for my main term, but now it's down to 7! According to SEO Moz, my domain authority is 32, while my better performing competitors are are 26, 11, and 1! Have more links than they do. Trying to think it through, not sure what is happening. My home page bounces at a low 20%-ish, other Google Analytics are good. I have a company Facebook account, occasionally upload YouTube vids, do online press releases, etc. I do have to target several metros scattered across the state, while my competitors usually focus on one major metro. I do have some SEO Moz errors, which focus on dup content due to our web editor's naming system. An example would be domain.com/keyword-keyword-i-14 vs. domain/differnet keyword-different better keyword-i-14. 14 would be the actual page number. Our system lets me change the page title keywords, as I've added new links and pages over the years there are some dupes. The only major change is I've added a password protected section for sales rep materials. The hosting/web guru firm we use has assured me Google doesn't see pages behind the password portection. Not sure if Google is testing a new SERP formula. All social media or non-website results seem to have dropped out of search for my terms. Just local business sites like mine and some directory sites remain. Any advice or private consult would be greatly appreciated as I am a ... self taught 'OneManBand' for high tech marketing in our company. Thanks
On-Page Optimization | | OneManBand0