API for testing duplicate content
-
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles).
API would be greatly prefered.
-
Hey Erica,
thanks for your answer. What I need is a way to decide on-the-fly whether two pages are similar or not. If they are too similar I need to depublish or at least rel canonical one of those.
Best solution would be an API that takes 2 pages, but it seems as if I have to build it myself then.
Thanks for your efforts.
-
While I don't know of an API that does that, you can set up your site using the SEOmoz tools and our Crawl Diagnostics section does look for Duplicate Cotent.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Responsive Code Creating Duplicate Content Issue
Good morning, Our developers have recently created a new site for our agency. The site is responsive for mobile/tablets. I've just put the site through Screaming Frog and I've been informed of duplicate H2s. When I've looked at some of the page sources, there are some instances of duplicated H2s and duplicated content. These duplicates don't actually appear on the site, only in the code. When I asked the development guys about this, they advised this is duplicated because of the code for the responsive site. Will the site be negatively affected because of this? Not everything is duplicated, which leads me to believe it probably could have been designed better... but I'm no developer so don't know for sure. I've checked the code for other responsive sites and no duplicates can be found. Thanks in advance, Lewis
Technical SEO | | PeaSoupDigital0 -
Duplicate content in product listing
We have "duplicate content" warning in our moz report which mostly revolve around our product listing (eCommerce site) where various filters return 0 results (and hence show the same content on the page). Do you think those need to be addressed, and if so how would you prevent product listing filters that appearing as duplicate content pages? should we use rel=canonical or actually change the content on the page?
Technical SEO | | erangalp0 -
Image centric site and duplicate content issues
We have a site that has very little text, the main purpose of the site is to allow users to find inspiration through images. 1000s of images come to us each week to be processed by our editorial team, so as part of our process we select a subset of the best images and process those with titles, alt text, tags, etc. We still host the other images and users can find them through galleries that link to the process and unprocessed image pages. Due to the lack of information on the unprocessed images, we are having lots of duplicate content issues (The layout of all the image pages are the same, and there isn't any unique text to differentiate the pages. The only changing factor is the image itself in each page) Any suggestions on how to resolve this issue, will be greatly appreciated.
Technical SEO | | wedlinkmedia0 -
SEOMOZ and non-duplicate duplicate content
Hi all, Looking through the lovely SEOMOZ report, by far its biggest complaint is that of perceived duplicate content. Its hard to avoid given the nature of eCommerce sites that oestensibly list products in a consistent framework. Most advice about duplicate content is about canonicalisation, but thats not really relevant when you have two different products being perceived as the same. Thing is, I might have ignored it but google ignores about 40% of our site map for I suspect the same reason. Basically I dont want us to appear "Spammy". Actually we do go to a lot of time to photograph and put a little flavour text for each product (in progress). I guess my question is, that given over 700 products, why 300ish of them would be considered duplicates and the remaning not? Here is a URL and one of its "duplicates" according to the SEOMOZ report: http://www.1010direct.com/DGV-DD1165-970-53/details.aspx
Technical SEO | | fretts
http://www.1010direct.com/TDV-019-GOLD-50/details.aspx Thanks for any help people0 -
Avoiding duplicate content on product pages?
Hi, I'm creating a bunch of product pages for courses for a university and I'm concerned about duplicate content penalties. While the page names are different and some of the test is different, much of the text is the same between pairs of pages. I.e. a BA and an MA in a particular subject (say 'hairdressing' will have the same subject descriptions, school introduction paragraph, industry overview paragraph etc. 1. Is this a problem? In a site with 100 pages, if sets of 2 pages have about 50% identical content... 2. If it is a problem, is there anything I can do, other than rewrite the text? 3. From a search perspective, would both pages show up in search results in searches related to 'hairdressing courses' 'study hairdressing' etc? Thanks!
Technical SEO | | AISFM0 -
Issue: Duplicate Page Content
Hi All, I am getting warnings about duplicate page content. The pages are normally 'tag' pages. I have some blog posts tagged with multiple 'tags'. Does it really affect my site?. I am using wordpress and Yoast SEO plugin. Thanks
Technical SEO | | KLLC0 -
Duplicate content issue
Hi everyone, I have an issue determining what type of duplicate content I have. www.example.com/index.php?mact=Calendar,m57663,default,1&m57663return_id=116&m57663detailpage=&m57663year=2011&m57663month=6&m57663day=19&m57663display=list&m57663return_link=1&m57663detail=1&m57663lang=en_GB&m57663returnid=116&page=116 Since I am not an coding expert, to me it looks like it is a URL parameter duplicate content. Is it? At the same time "return_id" would makes me think it is a session id duplicate content. I am confused about how to determine different types of duplicate content, even by reading articles on Seomoz about it: http://www.seomoz.org/learn-seo/duplicate-content. Could someone help me on how to recognize different types of duplicate content? Thank you!
Technical SEO | | Ideas-Money-Art0 -
Duplicate Page Content
Hi within my campaigns i get an error "crawl errors found" that says duplicate page content found, it finds the same content on the home pages below. Are these seen as two different pages? And how can i correct these errors as they are just one page? http://poolstar.net/ http://poolstar.net/Home_Page.php
Technical SEO | | RouteAccounts0