Changing the way SEOmoz Detects Duplicate Content
-
Hey everyone,
I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages
If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post:
1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that:
- **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported.
- **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported.
2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
-
That is good news. It will ease some minds that are going nuts over the duplicate content reporting. Thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz shows duplicate content, but URL's are tagged with campaign tags
Crawl diagnostics shows a lot of pages with duplicate content, but when I check the details, I see that it lists the same page but the url contains a campaign tag, so it's not really another page that is serving identical content... Is there a way to remove these pages out of the Crawl Diagnostics?
Moz Pro | | jorisbrabants0 -
Since July 1, we've had a HUGE jump in errors on our weekly crawl. We don't think anything has changed on our website. Has MOZ changed something that would account for a large leap in duplicate content and duplicate title errors?
Our error report went from 1,900 to 18,000 in one swoop, starting right around the first of July. The errors are duplicate content and duplicate title, as if it does not see our 301 redirects. Any insights?
Moz Pro | | KristyFord0 -
Duplicate Content, Canonicalization may not work in our scenario.
I'm new to SEO (so please excuse the lack of terminology), and will be taking over our companies inbound marketing completely, I previously just did data analysis and managed our PPC campaigns within Google and Bing/Yahoo, now I get all three, Yipee! But I digress. Before I get started here, I did read: http://moz.com/community/q/new-client-wants-to-keep-duplicate-content-targeting-different-cities?sort=most_helpful and I found both the answers there to be helpful, but indirect for my scenario. I'm conducting our companies first real SEO audit (thanks MOZ for the guide there), and duplicate content is going to be our number one problem to tackle. Our companies website was designed back in 2009, with the file structure /city-name/product-name. The problem with this is, we are open in over 50 cities now (and headed to 100 fast), and we are starting to amass duplicate content. Five products (and expanding), times the locations... you get it. My Question(s): How should I deal with this? The pages are almost identical, except listing the different information for each product depending upon it's location. However, for one of our products, Moz's own tools (PRO) did not find all the duplicate content, but did find some (I'm assuming it's because the pages have different course options and the address for the course is different, boils down to a different address on the very bottom of the body and different course options on the right sidebar). The other four products duplicate content were found and marked extensively. If I choose to use Canonicalization to link all the pages to one main page, I believe that would pass all the link juice to that one page, but we would no longer show in a Google search for the other cities, ex: washington DC example product name. Correct me if I'm wrong here. **Should I worry about the product who's duplicate content only was marked four times out of fifty cities? **I feel as if this question answers itself, but I still would like to have someone who knows more than me shed some light on this issue. The other four products are not going to be an issue as they are only offered online, but still follow the same file structure with /online in place of /city-name. These will be Canonicalized together under the /online location. One last thing I will mention here, having the city name in the url gives us a nice advantage (I think) when people are searching for products in cities we offer our product. (correct me again) If this is not the case, I believe I could talk our team into restructuring the files (if you think that's our best option). Some things you need to know about our site: We use a cookie for the location. Once you land on a page that has a location tied to it, the cookie is updated and saved. If the location does not exist, then you are redirected to a page to chose a location. I'm pretty sure this can cause some SEO issues too, but once again not sure. I know this is a wall of text, but I cannot tell you enough how appreciative I am in advance for your informative answers. Thanks a million, Trenton
Moz Pro | | PM_Academy0 -
I want to create a report of only de duplicate content pages as a csv file so i can create a script to canonicalize them.
I want to create a report of only de duplicate content pages as a csv file so i can create a script to canonicalize them. So i get something like: http://example.com/page1, http://example.com/page2, http://example.com/page3, http://example.com/page4, Because I now have to open each in "Issue: Duplicate Page Content", and this takes a lot of time. The same for duplicate page title.
Moz Pro | | nvs.nim0 -
Joined SEOMOZ but non changed.
hi. i am new SEOMoz and i joined pro monthly and create 2 campagnes but there are every thing shows me 0 and i guess nothing changed with before.. please see this.. | | StickerApt |
Moz Pro | | bratt
| Domain Authority | 1 |
| Domain MozRank | 0.00 |
| Domain MozTrust | 0.00 |
| External Followed Links | 0 |
| Total External Links | 0 |
| Total Links | 0 |
| Followed Linking Root Domains | 0 |
| Total Linking Root Domains | 0 |
| Linking C-Blocks | 0 |
| Followed Links
vs
NoFollowed Links****Followed Linking Root Domains
vs
NoFollowed Linking Root Domains | | ubdomain Metrics | canadastickerking | stickeryou | stickybusiness |
| 4.31 | Transparent 5.26 | 4.56 |
| 3.78 | Transparent 5.43 | 4.87 |
| 71 | Transparent 38,649 | 631 |
| 73 | Transparent 38,814 | 1,265 |
| 3,805 | Transparent 235,124 | 26,337 |
| 7 | Transparent 243 | 115 |
| 7 | Transparent 286 | 161 |
| | | | | | StickerApt |
| Subdomain MozRank | 0.00 |
| Subdomain MozTrust | 0.00 |
| External Followed Links | 0 |
| Total External Links | 0 |
| Total Links | 0 |
| Followed Linking Root Domains | 0 |
| Total Linking Root Domains | 0 |
| Followed Links
vs
NoFollowed Links****Followed Linking Root Domains
vs
NoFollowed Linking Root Domains | "No followed (0%)"
"No nofollowed (0%)" | Total Branded Keywords Manage brand rules Non-branded Keywords Week ending: 9/9 Change 9/16 9/9 Change 9/16 9/9 Change 9/16 --- --- --- --- --- --- --- --- --- --- Organic Search Visits 45 -18% 37 7 -14% 6 38 -18% 31 URLs Receiving Entrances Via Search 8 25% 10 4 -50% 2 4 100% 8 | Non-Paid Keywords Sending Search Visits | 20 | -10% | 18 | 6 | -33% | 4 | 14 | 0% | 14 | can any one help me what should i do with SEOMoz? all the keyword i set up tells me not in top 50 ranking.. i set up 12keyword for start.. and aslo i there is craw erros on my site but it can not be fixed because the page that getting erros are automatic quote pages and made php. and it should be duplicate pages it share with other pages, hard to explain.. but you will see what i am talking about it has duplicate title and content.. it should be like that.. http://www.stickerapt.com/quote.php i can not change duplicate erros is it gonna effect page rank? please help..ㅡㅡ0 -
SEOMoz reports and 404 errors
My SEOMoz report shows a 404 error, found today for this url: http://globalheavyhaul.com/google.com i do not have this anchor text anywhere on my website. How did Roger figure out that somebody looked for that page? Do I need to worry about 404 errors that are the result of user mistakes, instead of actual bad links?
Moz Pro | | FreightBoy0 -
Duplicate content & canonicals
Hi, Working on a website for a company that works in different european countries. The setup is like this: www.website.eu/nl
Moz Pro | | nvs.nim
www.website.eu/be
www.website.eu/fr
... You see that every country has it's own subdir, but NL & BE share the same language, dutch... The copywriter wrote some unique content for NL and for BE, but it isn't possible to write unique for every product detail page because it's pretty technical stuff that goes into those pages. Now we want to add canonical tags to those identical product pages. Do we point the canonical on the /be products to /nl products or visa versa? Other question regarding SEOmoz: If we add canonical tags to x-pages, do they still appear in the Crawl Errors "duplicate page content", or do we have to do our own math and just do "duplicate page content" minus "Rel canonical" ?0 -
How do I change my domain?
My domain was updated to use www by default. How do I update it in the profile?
Moz Pro | | dsfsystemsseo0