Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content with tagging and categories
Hello, Moz is showing that a site has duplicate content - which appears to be because of tags and categories. It is a relatively new site, with only a few blog publications so far. This means that the same articles are displayed under a number of different tags and categories... Is this something I should worry about, or just wait until I have more content? The 'tag' and 'category' pages are not really pages I would expect or aim for anyone to find in google results anyway. Would be glad to here any advice / opinions on this Thanks!
On-Page Optimization | | wearehappymedia1 -
Two sites, one with a ccTLD domain, the other with TLD domain, same content
Hi there! I have a site which can be accessed with two different domains: one ccTLD for Spain: www.piensapiensa.es one TLD www.piensapiensa.com Should I take care of something regarding SEO? I have also a redirection from www.piensapiensa.com to piensapiensa.com. I have set up them in webmasters tools individually, with the same sitemap obviously. Thanks in advanced.
On-Page Optimization | | juanmiguelcr0 -
Why is my site not ranking?
Could you please help me understand what is wrong with this site: www.award-certificates.com It simply isn't ranking after about 3 years and I am not so sure what I can do to improve it.
On-Page Optimization | | nicolebd0 -
Can you have more than 1 site on the first page if site look and content is completely different but keywords are the sam.
I have a client that wants to build another completely different site than his main site and optimize it to have 2 websites on the first page for his keywords. The content and look and feel of the website would be completely different. One of his competitors is doing it and getting away with it. What is your advice.
On-Page Optimization | | Roots70 -
Duplicate Content- Best Practise Usage of the canonical url
Canonical urls stop self competition - from duplicate content. So instead of a 2 pages with a rank of 5 out of 10, it is one page with a rank of 7 out of 10.
On-Page Optimization | | WMA
However what disadvantages come from using canonical urls. For example am I excluding some products like green widet, blue widget. I have a customer with 2 e-commerce websites(selling different manufacturers of a type jewellery). Both websites have massive duplicate content issues.
It is a hosted CMS system with very little SEO functionality, no plugins etc. The crawling report- comes back with 1000 of pages that are duplicates. It seems that almost every page on the website has a duplicate partner or more. The problem starts in that they have 2 categorys for each product type, instead of one category for each product type.
A wholesale category and a small pack category. So I have considered using a canonical url or de-optimizing the small pack category as I believe it receives less traffic than the whole category. On the original website I tried de- optimizing one of the pages that gets less traffic. I did this by changing the order of the meta title(keyword at the back, not front- by using small to start of with). I also removed content from the page. This helped a bit. Or I was thinking about just using a canonical url on the page that gets less traffic.
However what are the implications of this? What happens if some one searches for "small packs" of the product- will this no longer be indexed as a page. The next problem I have is the other 1000s of pages that are showing as duplicates. These are all the different products within the categories. The CMS does not have a front office that allows for canonical urls to be inserted. Instead it would have to be done going into the html of the pages. This would take ages. Another issue is that these product pages are not actually duplicate, but I think it is because they have such little content- that the rodger(seo moz crawler, and probably googles one too) cant tell the difference.
Also even if I did use the canonical url - what happened if people searched for the product by attributes(the variations of each product type)- like blue widget, black widget, brown widget. Would these all be excluded from Googles index.
On the one hand I want to get rid of the duplicate content, but I also want to have these pages included in the search. Perhaps I am taking too idealistic approach- trying to optimize a website for too many keywords. Should I just focus on the category keywords, and forget about product variations. Perhaps I look into Google Analytics, to determine the top landing pages, and which ones should be applied with a canonical. Also this website(hosted CMS) seems to have more duplicate content issues than I have seen with other e-commerce sites that I have applied SEO MOZ to On final related question. The first website has 2 landing pages- I think this is a techical issue. For example www.test.com and www.test.com/index. I realise I should use a canonical url on the page that gets less traffic. How do I determine this? (or should I just use the SEO MOZ Page rank tool?)0 -
What is the best way to manage industry required duplicate Important Safety Information (ISI) content on every page of a site?
Hello SEOmozzer! I have recently joined a large pharmaceutical marketing company as our head SEO guru, and I've encountered a duplicate content related issue here that I'd like some help on. Because there is so much red tape in the pharmaceutical industry, there are A LOT of limitations on website content, medication and drug claims, etc. Because of this, it is required to have Important Safety Information (ISI) clearly stated on every page of the client's website (including the homepage). The information is generally pretty lengthy, and in some cases is longer than the non-ISI content on each page. Here is an example: http://www.xifaxan.com/ All content under the ISI header is required on each page. My questions are: How will this duplicated content on each page affect our on-page optimization scores in the eyes of search engines? Is Google seeing this simply as duplicated content on every page, or are they "smart" enough to understand that because it is a drug website, this is industry standard (and required)? Aside from creating more meaty, non-ISI content for the site, are there any other suggestions you have for handling this potentially harmful SEO situation? And in case you were going to suggest it, we cannot simply have an image of the content, as it may not be visible by all internet users. We've already looked into that 😉 Thanks in advance! Dylan
On-Page Optimization | | MedThinkCommunications0 -
Duplicate page content & title for www.mydomain.com and www.mydomain.com/index.php?
Hi, First post so please be gentle! My Crawl Diagnostics Summary is showing an error relating to duplicate page content and duplicate page title for www.mydomain.com and www.mydomain.com/index.php which are, in my view, the same thing/page? Could anyone shed any light please? Thanks Carl
On-Page Optimization | | Carl2870 -
Duplicate Content
We offer Wellness programs for dogs and cats. A lot of the information is the same except for specifics that relate to young vs. senior pets. I have these different pages: Senior Wellness Kitten Wellness Puppy Wellness Adult Wellness Can each page have approx. 75% of the same text? Or should I rewrite each page so the information (though the same) appears unique.
On-Page Optimization | | PMC-3120870