API for testing duplicate content
-
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles).
API would be greatly prefered.
-
Hey Erica,
thanks for your answer. What I need is a way to decide on-the-fly whether two pages are similar or not. If they are too similar I need to depublish or at least rel canonical one of those.
Best solution would be an API that takes 2 pages, but it seems as if I have to build it myself then.
Thanks for your efforts.
-
While I don't know of an API that does that, you can set up your site using the SEOmoz tools and our Crawl Diagnostics section does look for Duplicate Cotent.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Tags - Do they only apply to internal duplicate content?
Hi Moz, I've had a complaint from a company who we use a feed from to populate a restaurants product list.They are upset that on our products pages we have canonical tags linking back to ourselves. These are in place as we have international versions of the site. They believe because they are the original source of content we need to canonical back to them. Can I please confirm that canonical tags are purely an internal duplicate content strategy. Canonical isn't telling google that from all the content on the web that this is the original source. It's just saying that from the content on our domains, this is the original one that should be ranked. Is that correct? Furthermore, if we implemented a canonical tag linking to Best Restaurants it would de-index all of our restaurants listings and pages and pass the authority of these pages to their site. Is this correct? Thanks!
Technical SEO | | benj20341 -
Duplicate content on user queries
Our website supports a unique business industry where our users will come to us to look for something very specific (a very specific product name) to find out where they can get it. The problem that we're facing is that the products are constantly changing due to the industry. So, for example, one month, one product might be found on our website, and the next, it might be removed completely... and then might come back again a couple months later. All things that are completely out of our control - and we have no way of receiving any sort of warning when these things might happen. Because of this, we're seeing a lot of duplicate content issues arise... For Example... Product A is not active today... so www.mysite.com/search/productA will return no results... Product B is also not active today... so www.mysite.com/search/productB will also return no results. As per Moz Analytics, these are showing up as duplicate content because both pages indicate "No results were found for {your searched term}." Unfortunately, it's a bit difficult to return a 204 in these situations (which I don't know if a 204 would help anyway) or a 404, because, for a faster user experience, we simultaneously render different sections of the page... so in the very beginning of the page load - we start rendering the faster content (template type of content) that says "returning 200 code, we got the query successfully & we're loading the page".. the unique content results finish loading last since they take the longest. I'm still very new to the SEO world, so would greatly appreciate any ideas or suggestions that might help with this... I'm stuck. 😛 Thanks in advance!
Technical SEO | | SFMoz0 -
Duplicate content - wordpress image attachement
I have run my seomoz campaign through my wordpress site and found duplicate content. However, all of this duplicate content was either my logo or images and no content with addresses like /?attachement_id=4 for example . How should I resolve this? thank you.
Technical SEO | | htmanage0 -
Duplicate Content on Navigation Structures
Hello SEOMoz Team, My organization is making a push to have a seamless navigation across all of its domains. Each of the domains publishes distinctly different content about various subjects. We want each of the domains to have its own separate identity as viewed by Google. It has been suggested internally that we keep the exact same navigation structure (40-50 links in the header) across the header of each of our 15 domains to ensure "unity" among all of the sites. Will this create a problem with duplicate content in the form of the menu structure, and will this cause Google to not consider the domains as being separate from each other? Thanks, Richard Robbins
Technical SEO | | LDS-SEO0 -
Press Releases & Duplicate Content
How do you do press releases without duplicating the content? I need to post it on my website along with having it on PR websites. But isn't that considered bad for SEO since it's duplicate content?
Technical SEO | | MercyCollege0 -
Mod Rewrite question to prevent duplicate content
Hi, I'm having problems with a mod rewrite issue and duplicate content On my website I have Website.com Website.com/directory Website.com/directory/Sub_directory_more_stuff_here Both #1 and #2 are the same page (I can't change this). #3 is different pages. How can I use mod rewrite to to make #2 redirect to #1 so I don't have duplicate content WHILE #3 still works?
Technical SEO | | kat20 -
I am Posting an article on my site and another site has asked to use the same article - Is this a duplicate content issue with google if i am the creator of the content and will it penalize our sites - or one more than the other??
I operate an ecommerce site for outdoor gear and was invited to guest post on a popular blog (not my site) for a trip i had been on. I wrote the aritcle for them and i also will post this same article on my website. Is this a dup content problem with google? and or the other site? Any Help. Also if i wanted to post this same article to 1 or 2 other blogs as long as they link back to me as the author of the article
Technical SEO | | isle_surf0 -
How to Solve Duplicate Page Content Issue?
I have created one campaign over SEOmoz tools for my website. I have found 89 duplicate content issue from report. Please, look in to Duplicate Page Content Issue. I am quite confuse to resolve this issue. Can any one suggest me best solution to resolve it?
Technical SEO | | CommercePundit0