Changing the way SEOmoz Detects Duplicate Content
-
Hey everyone,
I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages
If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post:
1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:
- **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported.
- **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.
2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
-
That is good news. It will ease some minds that are going nuts over the duplicate content reporting. Thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content : what are best solutions
Hello i am a beginner in website, I got my first report the report saying that there are some duplicate content. I would like to know What can I do to solve that issue ?
Moz Pro | | Dieumerci0 -
Duplicate Page
I just Check Crawl the status error with Duplicate Page Content. As Mentioned Below. Songs.pk | Download free mp3, Hindi Music, Indian Mp3 Songs http://www.getmp3songspk.com Songs.pk | Download free mp3, Hindi Music, Indian Mp3 Songs http://getmp3songspk.com and then i added these lines to my htaccess file RewriteBase /
Moz Pro | | Getmp3songspk
RewriteCond %{HTTP_HOST} !^www.getmp3songspk.com$ [NC]
RewriteRule ^(.*)$ http://www.getmp3songspk.com/$1 [L,R=301] But Still See that error again when i crawl a new test.0 -
Duplicate Page Content, Indexing and Rel Canonical Just DOUBLED! Need Advice to Fix
Last Friday (Penguin 5/2.1) my website shot way off the grid and I noticed in my MOZ PRO Campaign dashboard that all of the following just doubled in numbers on my website: duplicate page content, Google indexing, and rel canonicals. I also noticed that some of my pages, images, tags and categories now added a /page/2/ or a -2. I just changed noindex for tags, but indexing for media, pages, posts, and categories. I'm currently using All In One SEO for a plugin. Any advice would be much appreciated as I'm stuck on the issue. relconical.png Duplicate-Page-Content.png [Duplicate Content II](Duplicate Content II) index1.png
Moz Pro | | CelebrityPersonalTrainer0 -
Is there any way to export the organic traffic data?
All of the other Moz pages allow us to export for reports...why can I not export the organic traffic data page? I want a one stop shop...I don't want to have to go to Analytics! Don't make me leave the Moz systemmmmm! 🙂
Moz Pro | | Jeannie.0 -
Update in Moz spider/tools?? Flagging duplicate content / ignoring canonical
Hi all, Has there been an update in the SEOmoz crawling software? We now have thousands of dupe content/page title warnings for paginated product page URLs that have correctly formatted canonicals. e.g. http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx ... has following pages with identical content that have been flagged: http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4 http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=6 http://www.woolovers.com/british-wool/mens/olive-green/wool-countryman-suede-patch-sweater.aspx?p=true&rspage=4 ..plus 4 more URL's. But they all have canonical set. There's even a notice at the bottom of report that tells us there's a canonical set to http://www.woolovers.com/british-wool/mens/tweed-green/wool-countryman-suede-patch-sweater.aspx What gives, SEOmoz ?? Thanks Michael
Moz Pro | | LawrenceNeal0 -
Is it me or is SEOMOZ Q&A slow
I try and answer many questions here on SEOMoz Q&A, but the page load speed seems so slow I think my browser is going to time out. I don't remember it being this slow. Anybody else noticing slow loading (especially when asking, answering, or replying to questions)?
Moz Pro | | Francisco_Meza1 -
Using Seomoz for Site Evaluation am I up to par ?
Just wanted to see how people using the seomoz bar would rate a four month old site with Domain-Homepage Authority of 27 Mozrank of 5.08 and Moztrust of 5.65 . I've read up on all the factors but just wanted to know if Im up to par on building a great site thats search engine friendly. Inner pagers are on a PA of 20 and around the same mozrank and moztrust levels of +- 5.
Moz Pro | | NikolasNikolaou0 -
How to resolve Duplicate Content crawl errors for Magento Login Page
I am using the Magento shopping cart, and 99% of my duplicate content errors come from the login page. The URL looks like: http://www.site.com/customer/account/login/referer/aHR0cDovL3d3dy5tbW1zcGVjaW9zYS5jb20vcmV2aWV3L3Byb2R1Y3QvbGlzdC9pZC8xOTYvY2F0ZWdvcnkvNC8jcmV2aWV3LWZvcm0%2C/ Or, the same url but with the long string different from the one above. This link is available at the top of every page in my site, but I have made sure to add "rel=nofollow" as an attribute to the link in every case (it is done easily by modifying the header links template). Is there something else I should be doing? Do I need to try to add canonical to the login page? If so, does anyone know how to do it using XML?
Moz Pro | | kdl01