How do SEOMOZ calculate duplicate content?
-
first of all i have to much duplicate stuff on my website end cleaning it up. But if i look at GWMC the duplicate stuff is a lot less than in SEOMOZ? can someone explain to me what the difference is?
Thnx, Leonie.
-
Hi Andre, Thnx for the reply. i'll read it
-
Moz doesn't just look at the text of a page, it also looks at the template and how "similar" it appears compared to other pages.
Here's a quote from Dr. Pete:
"Our system currently uses a threshold of 95% to determine whether content is duplicated. This is based on the source code (not the text copy), so the amount of actual duplicate content may vary depending on the code/content ratio."
Here are a few articles you can read to get a deeper understanding.
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
http://www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical
http://www.seomoz.org/blog/the-illustrated-guide-to-duplicate-content-in-the-search-engines
http://www.seomoz.org/blog/rethinking-duplicate-content
http://www.seomoz.org/blog/fat-pandas-and-thin-content
http://www.seomoz.org/blog/the-illustrated-guide-to-duplicate-content-in-the-search-engines
Greg
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm doing a crawl analysis for a website and finding all these duplicate URLs with "null" being added to them and have no clue what could be causing this.
Does anyone know what could be causing this? Our dev team thinks it's caused by mobile pages they created a while ago but it is adding 1000's of additional URLs to the crawl report and being indexed by Google. They don't see it as a priority but I believe these could be very harmful to our site. examples from URL string:
Web Design | | julianne.amann
uruguay-argentina-chilenullnull/days
rainforests-volcanoes-wildlifenullnull/reviews
of-eastern-europenullnullnullnull/hotels0 -
Sitemap and Privacy Policy marked for duplicate content?
On a recent crawl, Moz flagged a page of our site for duplicate content. However, the pages listed are our sitemap and our privacy policy -- both very different: http://elearning.smp.org/sitemap/ http://elearning.smp.org/privacy-policy/ What is our best option to address this issue? I had considered a noindex tag on the privacy policy page, but since we have enabled user insights in Google Analytics we need to have the privacy policy displayed and I worry that putting a noindex on the page would cause problems later.
Web Design | | calliek0 -
Reasons Why Our Website Pages Randomly Loads Without Content
I know this is not a marketing question but this community is very dev savvy so I'm hoping someone can help me. At random times we're finding that our website pages load without the main body content. The header, footer and navigation loads just fine. If you refresh, it's fine but that's not a solution. Happens on Chrome, IE and Firefox, testing with multiple browser versions Happens across various page types - but seems to be only the main content section/container Happens while on the company network, as well as externally Happens after deleting cookies, temporary internet files and restarting computer We are using a CMS that is virtually unheard of - Bridgeline/Iapps Codebase is .net Our IT/Dev group keeps pushing back, blaming it on cookies or Chrome plugins because they apparently are unable to "recreate the problem". This has been going on for months and it's a terrible experience for the user to have. It's also not great when landing PPC visitors on pages that load with no content. If anyone has ideas as to why this may be happening I would really appreciate it. I'm not sure if links are allowed, by today the issue happened on this page serversdirect.com/dm/geek-biz Linking to an image example below knEUzqd
Web Design | | CliqStudios0 -
Lots of Listing Pages with Thin Content on Real Estate Web Site-Best to Set them to No-Index?
Greetings Moz Community: As a commercial real estate broker in Manhattan I run a web site with over 600 pages. Basically the pages are organized in the following categories: 1. Neighborhoods (Example:http://www.nyc-officespace-leader.com/neighborhoods/midtown-manhattan) 25 PAGES Low bounce rate 2. Types of Space (Example:http://www.nyc-officespace-leader.com/commercial-space/loft-space)
Web Design | | Kingalan1
15 PAGES Low bounce rate. 3. Blog (Example:http://www.nyc-officespace-leader.com/blog/how-long-does-leasing-process-take
30 PAGES Medium/high bounce rate 4. Services (Example:http://www.nyc-officespace-leader.com/brokerage-services/relocate-to-new-office-space) High bounce rate
3 PAGES 5. About Us (Example:http://www.nyc-officespace-leader.com/about-us/what-we-do
4 PAGES High bounce rate 6. Listings (Example:http://www.nyc-officespace-leader.com/listings/305-fifth-avenue-office-suite-1340sf)
300 PAGES High bounce rate (65%), thin content 7. Buildings (Example:http://www.nyc-officespace-leader.com/928-broadway
300 PAGES Very high bounce rate (exceeding 75%) Most of the listing pages do not have more than 100 words. My SEO firm is advising me to set them "No-Index, Follow". They believe the thin content could be hurting me. Is this an acceptable strategy? I am concerned that when Google detects 300 pages set to "No-Follow" they could interpret this as the site seeking to hide something and penalize us. Also, the building pages have a low click thru rate. Would it make sense to set them to "No-Follow" as well? Basically, would it increase authority in Google's eyes if we set pages that have thin content and/or low click thru rates to "No-Follow"? Any harm in doing this for about half the pages on the site? I might add that while I don't suffer from any manual penalty volume has gone down substantially in the last month. We upgraded the site in early June and somehow 175 pages were submitted to Google that should not have been indexed. A removal request has been made for those pages. Prior to that we were hit by Panda in April 2012 with search volume dropping from about 7,000 per month to 3,000 per month. Volume had increased back to 4,500 by April this year only to start tanking again. It was down to 3,600 in June. About 30 toxic links were removed in late April and a disavow file was submitted with Google in late April for removal of links from 80 toxic domains. Thanks in advance for your responses!! Alan0 -
Will changing content managment systems affect rankings?
We're considering changing our content management system. This would probably change our url structure (keep root domain name, but specific product pages and what not would have different full urls). Will our rankings be affected if we use different urls for current pages? I know we can do 401 redirects, but anything else I should consider? Thanks, Dan
Web Design | | dcostigan0 -
How To Avoid Duplicate Content
We are an eCommerce site for autoparts. It is basically impossible to avoid duplicate content, and I think we are getting penalized by Google for it. Here is why it is impossible. Let's say I sell a steering rack for a 2000 Honda Accord. I need an SEO rich page for 2000 Honda Accord Steering Rack. I sell steering racks for more than 25 years of Honda Accords. I can try and make the copy different but there is no way to spin the copy that many times and make it seem like it is not duplicate copy. This even gets more complicated because I sell hundreds of parts for each year of a Honda Accord, plus a lot of times you even have to go down to the engine size of the car for the right part. I can't use a redirect, ie 301 redirect because they are not the same pages. One is for a 2000 Honda Accord and the other a 2001 Honda Accord, and so on. Is their a redirect out there that I do not know about that would help me out in this case? Also, if their is no way around this and I am getting penalized would it be better to eliminate all these pages, possibly losing my ability to rank high on searches such as "2000 Honda Accord Steering Rack," and just replace with a page that has a Year Make Model, and Part dropdown which just takes the customer a checkout page?
Web Design | | joebuilder0 -
Duplicate content issue
I have recently built a site that has a main page intended to rank for national coverage. This site also has a number of pages targeted at local searches, these pages are slight variations of each other with town specific keywords. Does anyone know if google will see this as spam and quarantine my site from ranking? Thanks
Web Design | | stebutty0 -
Real Estate and Duplicate Content
Currently we use an MLS which is an iFrame of property listings. We plan to pay an extra fee and have the crawlable version. But one problem is that many real estate firms have access to the same data, which makes our content duplicate of theirs. Is there any way around this ? Thanks
Web Design | | SGMan0