Changing the way SEOmoz Detects Duplicate Content

KeriMorgret

Hey everyone,

I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages

If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post:

1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:

**We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported.
**Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.

2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.

William.Lau

That is good news. It will ease some minds that are going nuts over the duplicate content reporting. Thanks!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Changing the way SEOmoz Detects Duplicate Content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

What's more valuable: new content or optimizing old content

Duplicate Pages

Why seomoz crawler does not see my snapshot?

How do I change the update date for keywords ranking?

Duplicate Title

About Duplicate Content found by SEOMOZ... that is not duplicate

Blogger Duplicate Content? and Canonical Tag

How to set up SEOMOZ to track multilanguage sites?