API for testing duplicate content
-
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles).
API would be greatly prefered.
-
Hey Erica,
thanks for your answer. What I need is a way to decide on-the-fly whether two pages are similar or not. If they are too similar I need to depublish or at least rel canonical one of those.
Best solution would be an API that takes 2 pages, but it seems as if I have to build it myself then.
Thanks for your efforts.
-
While I don't know of an API that does that, you can set up your site using the SEOmoz tools and our Crawl Diagnostics section does look for Duplicate Cotent.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Purchasing duplicate content
Morning all, I have a client who is planning to expand their product range (online dictionary sites) to new markets and are considering the acquisition of data sets from low ranked competitors to supplement their own original data. They are quite large content sets and would mean a very high percentage of the site (hosted on a new sub domain) would be made up of duplicate content. Just to clarify, the competitor's content would stay online as well. I need to lay out the pros and cons of taking this approach so that they can move forward knowing the full facts. As I see it, this approach would mean forgoing ranking for most of the site and would need a heavy dose of original content as well as supplementing the data on page to build around the data. My main concern would be that launching with this level of duplicate data would end up damaging the authority of the site and subsequently the overall domain. I'd love to hear your thoughts!
Technical SEO | | BackPack851 -
Duplicate content and canonicalization confusion
Hello, http://bit.ly/1b48Lmp and http://bit.ly/1BuJkUR pages have same content and their canonical refers to the page itself. Yet, they rank in search engines. Is it because they have been targeted to different geographical locations? If so, still the content is same. Please help me clear this confusion. Regards
Technical SEO | | IM_Learner0 -
Duplicate Content - Reverse Phone Directory
Hi, Until a few months ago, my client's site had about 600 pages. He decided to implement what is essentially a reverse phone directory/lookup tool. There are now about 10,000 reverse directory/lookup pages (.html), all with short and duplicate content except for the phone number and the caller name. Needless to say, I'm getting thousands of duplicate content errors. Are there tricks of the trade to deal with this? In nosing around, I've discovered that the pages are showing up in Google search results (when searching for a specific phone number), usually in the first or second position. Ideally, each page would have unique content, but that's next to impossible with 10,000 pages. One potential solution I've come up with is incorporating user-generated content into each page (maybe via Disqus?), which over time would make each page unique. I've also thought about suggesting that he move those pages onto a different domain. I'd appreciate any advice/suggestions, as well as any insights into the long-term repercussions of having so many dupes on the ranking of the 600 solidly unique pages on the site. Thanks in advance for your help!
Technical SEO | | sally580 -
Magento Multistore and Duplicate Content
Hey all, I am currently optimizing a Magento Multistore running with two store views (one per language). Now when I switch from one language to another the urls shows: mydomain.de/.../examplepage.html?___store=german&___from_store=english The same page can also be reached by just entering mydomain.de/.../examplepage.html The question is: Does Google consider this as Duplicate Content or it it nothing to worry about? Or should I just do a dynamic 301 redirect from the 1st version to the 2nd? I read about some hacks posted in diferent magento forums but as I am working for a customer I want to avoid hacks. Also setting "Add Store Code to Urls" didn't help.
Technical SEO | | dominator0 -
Is this considered Duplicate Content?
Good Morning, Just wondering if these pages are considered duplicate content? http://goo.gl/t9lkm http://goo.gl/mtfbf Can you please take a look and advise if it is considered duplicate and if so, what should i do to fix... Thanks
Technical SEO | | Prime850 -
Duplicate content problem from an index.php file
Hi One of my sites is flagging a duplicate content problem which is affecting the search rankings. The duplicate problem is caused by http://www.mydomain.com/index.php which has a page rank of 26 How can I sort the duplicate content problem, as the main page should just be http://www.mydomain.com which has a page rank of 42 and is the stronger page with stronger links etc Many Thanks
Technical SEO | | ocelot0 -
How much to change to avoid duplicate content?
Working on a site for a dentist. They have a long list of services that they want us to flesh out with text. They provided a bullet list of services, we're trying to get 1 to 2 paragraphs of text for each. Obviously, we're not going to write this off the top of our heads. We're pulling text from other sources and trying to rework. The question is, how much rephrasing do we have to do to avoid a duplicate content penalty? Do we make sure there are changes per paragraph, sentence, or phrase? Thanks! Eric
Technical SEO | | ericmccarty0 -
Duplicate content on my home
Hello, I have duplication with my home page. It comes in two versions of the languages: French and English. http://www.numeridanse.tv/fr/ http://www.numeridanse.tv/en/ You should know that the home page are not directories : http://www.numeridanse.tv/ Google indexes the three versions: http://bit.ly/oqKT0H To avoid duplicating what is the best solution?
Technical SEO | | android_lyon
Have a version of the default language? Thanks a lot for your answers. Take care. A.0