API for testing duplicate content
-
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles).
API would be greatly prefered.
-
Hey Erica,
thanks for your answer. What I need is a way to decide on-the-fly whether two pages are similar or not. If they are too similar I need to depublish or at least rel canonical one of those.
Best solution would be an API that takes 2 pages, but it seems as if I have to build it myself then.
Thanks for your efforts.
-
While I don't know of an API that does that, you can set up your site using the SEOmoz tools and our Crawl Diagnostics section does look for Duplicate Cotent.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is this duplicate content that I should be worried about?
Our product descriptions appear in two places and on one page they appear twice. The best way to illustrate that would be to link you to a search results page that features one product. My duplicate content concern refers to the following, When the customer clicks the product a pop-up is displayed that features the product description (first showing of content) When the customer clicks the 'VIEW PRODUCT' button the product description is shown below the buy buytton (second showing of content), this is to do with the template of the page and is why it is also shown in the pop-up. This product description is then also repeated further down in the tabs (third showing of content). My thoughts are that point 1 doesn't matter as the content isn't being shown from a dedicated URL and it relies on javascript. With regards to point 2, is the fact the same paragraph appears on the page twice a massive issue and a duplicate content problem? Thanks
Technical SEO | | joe-ainswoth0 -
Duplicate content issue on Magento platform
We have a lot of duplicate pages (600 urls) on our site (total urls 800) built on the Magento e-commerce platform. We have the same products in a number of different categories that make it easy for people to choose which product suits their needs. If we enable the canonical fix in Magento will it dramatically reduce the number of pages that are indexed. Surely with more pages indexed (even though they are duplicates) we get more search results visibility. I'm new to this particular SEO issue. What do the SEO community have to say on this matter. Do we go ahead with the canonical fix or leave it?
Technical SEO | | PeterDavies0 -
Query Strings causing Duplicate Content
I am working with a client that has multiple locations across the nation, and they recently merged all of the location sites into one site. To allow the lead capture forms to pre-populate the locations, they are using the query string /?location=cityname on every page. EXAMPLE - www.example.com/product www.example.com/product/?location=nashville www.example.com/product/?location=chicago There are thirty locations across the nation, so, every page x 30 is being flagged as duplicate content... at least in the crawl through MOZ. Does using that query string actually cause a duplicate content problem?
Technical SEO | | Rooted1 -
Duplicate content and canonicalization confusion
Hello, http://bit.ly/1b48Lmp and http://bit.ly/1BuJkUR pages have same content and their canonical refers to the page itself. Yet, they rank in search engines. Is it because they have been targeted to different geographical locations? If so, still the content is same. Please help me clear this confusion. Regards
Technical SEO | | IM_Learner0 -
Hiding Duplicate Content using Javascript
We have e-commerce site selling books. Besides basic information on books, we have content for “About the book” , “Editorial Reviews”, “About the author” etc. But the content in all these section are duplicate and are available on all sites selling similar books. Our question is: 1.Should we worry about the content being duplicate?2.If yes, then will it by a good idea to hide this duplicate content using javascript or iframe?
Technical SEO | | CyrilWilson0 -
If two websites pull the same content from the same source in a CMS, does it count as duplicate content?
I have a client who wants to publish the same information about a hotel (summary, bullet list of amenities, roughly 200 words + images) to two different websites that they own. One is their main company website where the goal is booking, the other is a special program where that hotel is featured as an option for booking under this special promotion. Both websites are pulling the same content file from a centralized CMS, but they are different domains. My question is two fold: • To a search engine does this count as duplicate content? • If it does, is there a way to configure the publishing of this content to avoid SEO penalties (such as a feed of content to the microsite, etc.) or should the content be written uniquely from one site to the next? Any help you can offer would be greatly appreciated.
Technical SEO | | HeadwatersContent0 -
Masses (5,168 issues found) of Duplicate content.
Hi Mozzers, I have a site that has returned 5,168 issues with duplicate content. Where would you start? I started sorting via High page Authority first the highest being 28 all the way down to 1. I did want to use the rel=canonical tag as the site has many redirects already. The duplicates are caused by various category and cross category pages and search results such as ....page/1?show=2&sort=rand. I was thinking of going down the lines of a URL rewrite and changing the search anyway. Is it work redirecting everything in terms of results versus the effort of changing all the 5,168 issues? Thanks sm
Technical SEO | | Metropolis0 -
Duplicate Content
Many of the pages on my site are similar in structure/content but not exactly the same. What amount of content should be unique for Google to not consider it duplicate? If it is something like 50% unique would it be preferable to choose one page as the canonical instead of keeping them both as separate pages?
Technical SEO | | theLotter0