Changing the way SEOmoz Detects Duplicate Content
-
Hey everyone,
I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages
If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post:
1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:
- **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported.
- **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.
2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
-
That is good news. It will ease some minds that are going nuts over the duplicate content reporting. Thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Next JS and Missing content
Hello
Moz Pro | | 4thWhale
We recently migrated our page to next JS which is supposed to be great for SEO
On almost all our pages we are getting the same errors Missing Canonical Tag Missing Title Missing or Invalid H1 Missing Description We don't understand this because we have all of that content on every page. We believe that maybe NextJs is having a incompatibility with Moz. Has anyone had any experience with this?0 -
Moz crawl duplicate pages issues
Hi According to the moz crawl on my website I have in the region of 800 pages which are considered internal duplicates. I'm a little puzzled by this, even more so as some of the pages it lists as being duplicate of another are not. For example, the moz crawler considers page B to be a duplicate of page A in the urls below: Not sure on the live link policy so ive put a space in the urls to 'unlive' them. Page A http:// nuchic.co.uk/index.php/jeans/straight-jeans.html?manufacturer=3751 Page B http:// nuchic.co.uk/index.php/catalog/category/view/s/accessories/id/92/?cat=97&manufacturer=3603 One is a filter page for Curvety Jeans and the other a filter page for Charles Clinkard Accessories. The page titles are different, the page content is different so Ive no idea why these would be considered duplicate. Thin maybe, but not duplicate. Like wise, pages B and C are considered a duplicate of page A in the following Page A http:// nuchic.co.uk/index.php/bags.html?dir=desc&manufacturer=4050&order=price Page B http:// nuchic.co.uk/index.php/catalog/category/view/s/purses/id/98/?manufacturer=4001 Page C http:// nuchic.co.uk/index.php/coats/waistcoats.html?manufacturer=4053 Again, these are product filter pages which the crawler would have found using the site filtering system, but, again, I cannot find what makes pages B and C a duplicate of A. Page A is a filtered result for Great Plains Bags (filtered from the general bags collection). Page B is the filtered results for Chic Look Purses from the Purses section and Page C is the filtered results for Apricot Waistcoats from the Waistcoat section. I'm keen to fix the duplicate content errors on the site before it goes properly live at the end of this month - that's why anyone kind enough to check the links will see a few design issues with the site - however in order to fix the problem I first need to work out what it is and I can't in this case. Can anyone else see how these pages could be considered a duplicate of each other please? Checking ive not gone mad!! Thanks, Carl
Moz Pro | | daedriccarl0 -
SEOMoz API not working for Scrapebox
I want to import SEOMoz data to list of URLs I have using scrapbox. I added in my credentials according to the API but am getting error 401 as the status of all my links. Any idea why and what I should be doing?
Moz Pro | | theLotter0 -
Duplicate Content
My website is hosted by Hubspot. With each blog I write I can tag them to be listed in a specific category. As an example, one blog article my have three tags or categories that it fits in. Seomoz is seeing this as a duplication of content. in other words, if you go to the different category pages the same article would be listed on all three pages, even though it is just one article. However, I only have 36 duplicate content warnings and I have 150 blog articles, each having 2 or 3 tags (categories.), so there should be many more than 36 duplications. Is this something that affects my seo, or should I just ignore the problem and check these warnings as fixed? Thanks,
Moz Pro | | Rong
Ron0 -
Getting rid of duplicate content
Hi everyone, I'm a newbie and at the moment don't know very much about SEO. I have a problem with some of my campaigns where i keep getting a report with either Duplicate Page and/or Duplicate Content errors. I have no idea how to rectify this error, remove it or fix it on the relevant websites. Can anyone please help explain how to do this, maybe step by step? I really appreciate your views and opinions! Regards, Hugh
Moz Pro | | DigitalAcademyZA0 -
Does SEOMOZ use Google Search?
I run a Joomla site and have noticed users having a hard time finding what they are looking for. The default Joomla search is lacking and the latest Joomla search component I added is better but still not great. I've always been able to find what I'm looking for on SeoMoz, do you guys use Google Site Search?
Moz Pro | | mr_w0 -
SEOMoz Link Analysis Not Updating?
Hi there, I am wondering why the Link Analysis in SEOMoz takes so long to recognise new back-links. I have had the same figures showing for months when I know that I have added a lot more backlinks that are showing in my Google WebMaster account and I have also proven exist by searching directly in Google's search engine. How often does the Link Analysis section update itself? Are these figures worth following at all or are they useless? Thanks!
Moz Pro | | onlineexpression
Karl0 -
SEOmoz bot and "noindex"
As a recent newbie to SEOmoz, I've been implementing some suggestions and doing a general tidy up. I removed URL's from our robots txt, and rolled out instead the noindex meta tag to pages we don't want indexed. But surprised to see issues that are now flagged from the last crawl by the moz bot from pages that have this meta tag? Does the SEOmoz bot not ignore this tag? Just want to make sure I've implemented it correctly, so the google bot does ignore it. Meta tag syntax is and is placed below the title tag. cheers Steve
Moz Pro | | sjr4x40