Changing the way SEOmoz Detects Duplicate Content
-
Hey everyone,
I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages
If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post:
1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:
- **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported.
- **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.
2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
-
That is good news. It will ease some minds that are going nuts over the duplicate content reporting. Thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Another site copied my content word for word. Whats the best way to handle or report this?
I work at CloudHashing and created a Bitcoin 101 and glossary for the site. - https://cloudhashing.com/bitcoin-101 The content was written by my team members and I, so it's all unique content. This site recently popped up and scraped all of our content - http://www.pacifichashing.com/bitcoin-class-101/ (It looks like they are located in Hong Kong- I'm not positive though) How will this affect us and will it benefit their site to copy our content? What's the best way to handle situations like this? Should I report this? If so, how? It sucks to see something my team and I worked really hard on to have it scraped by a competitor. Thanks in advance for any help! -Ryan
Moz Pro | | cloudhasher0 -
Duplicate Content even when Canonical is used
Hi Everyone, Our website uses the Magento platform which is notorious for creating duplicate content. I tried to make sure that all the duplicate content it creates should be "canonicalized" to the correct page. While looking through the moz Page Diagnostics I see that I have 1003(!) pages of duplicate content. When I downloaded the csv I saw that over 95% of them had a canonical url. Does that mean there is really no issue but moz analytics is still reading it as duplicate content and titles? Is there an issue with them being canonicals as opposed to being redirected? Thanks!
Moz Pro | | EcomLkwd1 -
Is there a way to export keywords from SEOMOZ?
Hi, is there a way to export keywords of a campaign in SEOMOZ? The other way around to import including labels? Best
Moz Pro | | ValerieSchmidt0 -
How to change a Discussions post to a Question
I've opened two discussions recently but they really happen to be more like questions. They were both answered but both posts are still opened : http://www.seomoz.org/q/reading-suggestions-for-a-local-french-small-biz-website http://www.seomoz.org/q/should-i-block-wordpress-archive-and-tag How can I change them back to Question so I can mark them as close? Thanks 🙂
Moz Pro | | Akeif0 -
Drop in Number of Crawled pages by SEOMOZ?
I noticed that the number of Crawled Pages on my website has been 2 pages only over past week. Before that the number of crawled pages was over 1000. My site has numerous pages as it is a Travel website that pulls search results for Flights, Cars, Hotels, Cruises and Vacation packages so there is a huge Database there. Can someone help? Thanks !
Moz Pro | | sherohass0 -
Port 80 and Duplicate Content
The SEOmoz Web App is showing me that every single URL on one of my clients' domains has a duplicate in the form of the URL + :80. For instance, the app is showing me that www.example.com/default.aspx is duplicated in the form of www.example.com:80/default.aspx Any idea if this is an actual problem or just some kind of reporting error? Any help would be appreciated.
Moz Pro | | AnthonyMangia0 -
How do you use SEOmoz
I'm a newbie here. I've got a week left on my free trail. I really like SEOmoz, I might even love it a bit. 🙂 I'm trying to justify the cost. I usually don't spend money on my Internet Marketing. I've done it through finally buying some tools to help me get there. Massive change in the last year. I'm a long ways from getting rich YET though. 🙂 I've gone from $30 a month to $1000 a month in a year. So I think I'm on my way now. I'm trying to figure out why I should continue my subscription and actually start paying for it. If I were a developer I can certainly see why but the cost would so go way up. 🙂 I'm in the lowest level package with only five websites. I have 10 I'm working on. I know that's probably five to many at one time. I'm having a little trouble justifying the cost at this time. At my income level $100 a month is significant. I can see the value of seeing where my competition has backlinks. Sometimes my competition is Cnet. haha Well, I can beat them on a specific page at times. Hmm, can I search the URL list for links? Perhaps i can if I export it. That didn't occur to me. Now that could be of use if I can find my keywords. If their link is on Amazon though, I'm going to have trouble getting a link there. 🙂 Maybe I should work on less competitive links but I made good money last month with one page that was beating them. WAS beating them. How do you use SEOmoz? How do you justify the cost, that is offset the expense. If its not making me money i don't see the value. I can get a lot of this information for free but it sure is easier to get it here. Which saves me time. And the other tool i use isn't that good and not always reliable and slow. I've got a week left on my free trail. I had decided to let it go but now I'm reconsidering. $100 is a lot for me and I need to spend more on SEO outsourcing. I wonder if I wouldn't be better off dropping it for a while and then coming back. This certainly isn't mean to be a critical post at all. I'm looking for reasons to stay and make better use of it. Thanks for the thoughts, Rusty
Moz Pro | | RustyF0 -
SEOmoz bar causes FF to hang
I use FireFox as my browser on a Windows pc. When I close FF it rarely closes properly. The process is still visible in the task manager. I need to end the process to shut it down. After researching the issue I learned this problem is usually caused by an add-on. I disabled my add-ons one at a time and it is clearly the SEOmoz bar causing the issue. I can run every other add-on without any problem but if I use the mozbar but itself, the issue occurs. I plan to report this problem to the help desk but first I wanted to ask if others are experiencing the same issue. The more data that can be collected, the easier it will be to resolve the problem. Thanks in advance for your feedback.
Moz Pro | | RyanKent0