Duplicate Content Indentification Tools
-
Does anyone have a recommendation for a good tool that can identify which elements on a page are duplicated content? I use Moz Analytics to determine which pages have the duplicated content on them, but it doesn't say which pieces of text or on-page elements are in fact considered to be duplicate.
Thanks Moz Community in advance!
-
Thank you. These steps are a part of our process.
-
Here is some guidelines from Google Webmasters Help on Duplicate Content with tips to resolve issues.
-
Yes. I also agree that CopyScape is better for plagiarism. I am also reviewing the canonical tags we have in place for these pages. I am trying to view the marked pages from a few different angles to gain a fuller understanding of why indeed they are being marked with 'duplicate content' warnings on our analytics platform and for a deeper understanding of the situation so to create a process of checks for any future warnings.
-
I use CopyScape but it's more of a plagiarism tool then an actual duplicate content identifier tool. I say that because just because a few lines of text are the same on a page, that doesn't mean Google will remove it from the SERPs. Generally duplicate content has to be a substantial portion of a webpage to be considered duplicate content.
I would first dig into Moz Analytics and see WHY you are generating duplicate content before I would worry about what part of the page is duplicate.
- Have you set canonicals on your pages?
- Does your site produce session IDs?
- Do you have pagination?
- Are you copying and pasting text from page to page to fill up your site?
Google has said time and time again, duplicate content issues are rarely a penalty. It is more about Google knowing which page they should rank and which page they should not. Take a look at why you are getting the duplicate content issue and then we can help you resolve it or give advice on what to do next.
-
Copyscape.com will tell you if you have duplicate content. If you have a big site with loads of pages I'd buy credits or you'll have difficulty because it only lets you check a few pages per day (I can't remember what the limit is). With the paid version you can upload your xml sitemap (s) and it'll check all the pages in that file. Then the report will highlight the bits of copy that is duplicate.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate page titles and Content in Woocommerce
Hi Guys, I'm new to Moz and really liking it so far!
On-Page Optimization | | jeeyer
I run a eCommerce site on Wordpress + WooCommerce and ofcourse use Yoast for SEO optimalisation I've got a question about my first Crawl report which showed over 600 issues! 😐 I've read that this is something that happens more often (http://moz.com/blog/setup-wordpress-for-seo-success). Most of them are categorized under:
1. Duplicate Page Titles or;
2. Duplicate Page Content. Duplicate Page Titles:
These are almost only: product category pages and product tags. Is this problem beeing solved by giving them the right SEO SERP? I see that a lot of categories don't have a proper SEO SERP set up in yoast! Do I need to add this to clear this issue, or do I need to change the actual Title? And how about the Product tags? Another point (bit more off-topic) I've read here: http://moz.com/community/q/yoast-seo-plugin-to-index-or-not-to-index-categories that it's advised to noindex/follow Categories and Tags but isn't that a wierd idea to do for a eCommerce site?! Duplicate Page Content:
Same goes here almost only Product Categories and product tags that are displayed as duplicate Page content! When I check the results I can click on a blue button for example "+ 17 duplicates" and that shows me (in this case 17 URLS) but they are not related to the fist in any way so not sure where to start here? Thanks for taking the time to help out!
Joost0 -
Identifying Duplicate Page Title
Moz weekly reports, among other things, the "Duplicate Page Title". How can I identify which two urls/pages have duplicate page titles? Is there any simple way to trace?
On-Page Optimization | | Sequelmed0 -
Duplicate Content - What can be duplicate in two different product pages.
I am having a hard time understanding how my 3 different product pages are being shown up as Duplicate Content in s crawl. Some of my 21 different pages are being shown as duplicate content. Here are 3 of those: 1. http://champu.in/korn-rock-band-mens-round-neck-t-shirt-india 2. http://champu.in/stop-the-burning-mens-round-neck-t-shirt-india 3. http://champu.in/funny-t-shirts/absolut-punjabi-red-men-s-round-neck-t-shirt Can someone help me with this. Thanks in advance 🙂
On-Page Optimization | | sidjain4you0 -
Duplicate Content from WordPress Category Base?
I recently changed my category base in WordPress and instead of redirecting or deleting the old base, WordPress kept the content up. So I now have duplicate content on two different urls - one on the old category base, one on the new category base. How should I handle this situation? The site is only a couple weeks old, if that makes any difference.
On-Page Optimization | | JABacchetta0 -
How to avoid duplicates when URL and content changes during the course of a day?
I'm currently facing the following challenge: Newspaper industry: the content and title of some (featured) articles change a couple of times during a normal day. The CMS is setup so each article can be found by only using it's specific id (eg. domain.tld/123). A normal article looks like this: domain.tld/some-path/sub-path/i-am-the-topic,123 Now the article gets changed and with it the topic. It looks like this now: domain.tld/some-path/sub-path/i-am-the-new-topic,123 I can not tell the writers that they can not change the article as they wish any more. I could implement canonicals pointing to the short url (domain.tld/123). I could try to change the URL's to something like domain.tld/some-path/sub-path/123. Then we would lose keywords in URL (which afaik is not that important as a ranking factor; rather as a CTR factor). If anyone has experiences sharing them would be greatly appreciated. Thanks, Jan
On-Page Optimization | | jmueller0 -
Duplicate Page Title
I have a dating site, it's got a lot of duplicate page titles, most of them are the language buttons for the users to view the site in there language. but I think it's obvious that the buttons don't have anything to do with it. I'm thinking that page tittle is basically a description of what the site is. like for an example "online-dating" is this it? please tell me in terms for a dummy, how to fix it.
On-Page Optimization | | clickit2getwithit0 -
Software/tool for Content Inventory
Is there any tool out there which can crawl a section of a website (or whole website) and gives a list of urls by subsections? I have tried PowerMapper but it only gives visual map, not a list. Thanks!
On-Page Optimization | | StickyRiceSEO0 -
Prevent indexing of dynamic content
Hi folks! I discovered bit of an issue with a client's site. Primarily, the site consists of static html pages, however, within one page (a car photo gallery), a line of php coding: dynamically generates a 100 or so pages comprising the photo gallery - all with the same page title and meta description. The photo gallery script resides in the /gallery folder, which I attempted to block via robots.txt - to no avail. My next step will be to include a: within the head section of the html page, but I am wondering if this will stop the bots dead in their tracks or will they still be able to pick-up on the pages generated by the call to the php script residing a bit further down on the page? Dino
On-Page Optimization | | SCW0