API for testing duplicate content
-
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles).
API would be greatly prefered.
-
Hey Erica,
thanks for your answer. What I need is a way to decide on-the-fly whether two pages are similar or not. If they are too similar I need to depublish or at least rel canonical one of those.
Best solution would be an API that takes 2 pages, but it seems as if I have to build it myself then.
Thanks for your efforts.
-
While I don't know of an API that does that, you can set up your site using the SEOmoz tools and our Crawl Diagnostics section does look for Duplicate Cotent.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content incorrectly being duplicated on microsite
So bear with me here as this is probably a technical issue and i am not that technical. We have a microsite for one of our partner organisations and recently we have detected that content from our main site appearing in the URLs for the microsite - both in search results and then when you click through to the SERP. However, this content does not exist on the actual website at all. Anyone have a possible explanation for this? I have tried searching the web but nothing. I assume there is something in the set up of the microsite that is associating it with the content on the main site.
Technical SEO | | Discovery_SA0 -
Duplicate content
Hello mozzers, I have an unusual question. I've created a page that I am fully aware that it is near 100% duplicate content. It quotes the law, so it's not changeable. The page is very linkable in my niche. Is there a way I can build quality links to it that benefit my overall websites DA (i'm not bothered about the linkable page being ranked) without risking panda/dupe content issues? Thanks, Peter
Technical SEO | | peterm21 -
How to solve Parameter Issue causing Duplicate Content
Hi everyone, My site home page comes up in SERP with following url www.sitename/?referer=indiagrid My question is:- Should I disallow using robots.txt.? or 301 redirect to the home page Other issue is i have few dynamic generated URL's for a form http://www.www.sitename/career-form.php?position=SEO Executive I am using parameter "position" in URL Parameter in GWT. But still my pages are indexed that is leading to duplicate page content. Please help me out.
Technical SEO | | himanshu3019890 -
How to fix duplicate page content error?
SEOmoz's Crawl Diagnostics is complaining about a duplicate page error. The example of links that has duplicate page content error are http://www.equipnet.com/misc-spare-motors-and-pumps_listid_348855 http://www.equipnet.com/misc-spare-motors-and-pumps_listid_348852 These are not duplicate pages. There are some values that are different on both pages like listing # , equipnet tag # , price. I am not sure how do highlight the different things the two page has like the "Equipment Tag # and listing #". Do they resolve if i use some style attribute to highlight such values on page? Please help me with this as i am not really sure why seo is thinking that both pages have same content. Thanks !!!
Technical SEO | | RGEQUIPNET0 -
I have a ton of "duplicated content", "duplicated titles" in my website, solutions?
hi and thanks in advance, I have a Jomsocial site with 1000 users it is highly customized and as a result of the customization we did some of the pages have 5 or more different types of URLS pointing to the same page. Google has indexed 16.000 links already and the cowling report show a lot of duplicated content. this links are important for some of the functionality and are dynamically created and will continue growing, my developers offered my to create rules in robots file so a big part of this links don't get indexed but Google webmaster tools post says the following: "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools." here is an example of the links: | | http://anxietysocialnet.com/profile/edit-profile/salocharly http://anxietysocialnet.com/salocharly/profile http://anxietysocialnet.com/profile/preferences/salocharly http://anxietysocialnet.com/profile/salocharly http://anxietysocialnet.com/profile/privacy/salocharly http://anxietysocialnet.com/profile/edit-details/salocharly http://anxietysocialnet.com/profile/change-profile-picture/salocharly | | so the question is, is this really that bad?? what are my options? it is really a good solution to set rules in robots so big chunks of the site don't get indexed? is there any other way i can resolve this? Thanks again! Salo
Technical SEO | | Salocharly0 -
Google Duplicate Content Penalty On My Own Site?
I am certain that I have hit a google penalty filter for my site http://www.playpokeronline.ca for my main keywords "play poker online" in google.ca I rank 670th and used to be on the first page between 1 and 10 in June. On Bing I am like 9th On my site I found the entire site duplicated as follows Original: www.playpokeronline.ca Duplicate www.playpokeronline.ca/playpokeronline/ this duplicate was not intentional and seems to be a result of my hosting at godaddy. for every page on my site and it shows up in webmaster tools I blocked the duplicate with robots.txt and a few days ago dropped it and wrote a rel=connonical tag in the top of each page visitors dropped from 100 per day in august to 12-20 in the last month. Google says that if duplicate content is made to try to game serps they may filter or penalize my site. Have I triggered this penalty or a different sort of over optimization penalty? Will the rel= canonical tags fix this or should i do something else? This Penalty Business is Not my Idea of a good time Thank You Jeb
Technical SEO | | PokerCanada0 -
Duplicate Content and Canonical use
We have a pagination issue, which the developers seem reluctant (or incapable) to fix whereby we have 3 of the same page (slightly differing URLs) coming up in different pages in the archived article index. The indexing convention was very poorly thought up by the developers and has left us with the same article on, for example, page 1, 2 and 3 of the article index, hence the duplications. Is this a clear cut case of using a canonical tag? Quite concerned this is going to have a negative impact on ranking, of course. Cheers Martin
Technical SEO | | Martin_S0 -
Duplicate Content Home Page
Hello, I am getting Duplicate Content warning from SEOMoz for my home page: http://www.teacherprose.com http://www.teacherprose.com/index html I tried code below in .htaccess: redirect 301 /index.html http://www.teacherprose.com This caused error "too many re-directs" in browser Any thoughts? Thank You, Eric
Technical SEO | | monthelie10