Changing the way SEOmoz Detects Duplicate Content

KeriMorgret

Hey everyone,

I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages

If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post:

1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:

**We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported.
**Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.

2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.

William.Lau

That is good news. It will ease some minds that are going nuts over the duplicate content reporting. Thanks!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Changing the way SEOmoz Detects Duplicate Content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Unsolved Next JS and Missing content

Moz crawl duplicate pages issues

SEOMoz API not working for Scrapebox

Duplicate Content

Getting rid of duplicate content

Does SEOMOZ use Google Search?

SEOMoz Link Analysis Not Updating?

SEOmoz bot and "noindex"