How do SEOMOZ calculate duplicate content?
-
first of all i have to much duplicate stuff on my website end cleaning it up. But if i look at GWMC the duplicate stuff is a lot less than in SEOMOZ? can someone explain to me what the difference is?
Thnx, Leonie.
-
Hi Andre, Thnx for the reply. i'll read it
-
Moz doesn't just look at the text of a page, it also looks at the template and how "similar" it appears compared to other pages.
Here's a quote from Dr. Pete:
"Our system currently uses a threshold of 95% to determine whether content is duplicated. This is based on the source code (not the text copy), so the amount of actual duplicate content may vary depending on the code/content ratio."
Here are a few articles you can read to get a deeper understanding.
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
http://www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical
http://www.seomoz.org/blog/the-illustrated-guide-to-duplicate-content-in-the-search-engines
http://www.seomoz.org/blog/rethinking-duplicate-content
http://www.seomoz.org/blog/fat-pandas-and-thin-content
http://www.seomoz.org/blog/the-illustrated-guide-to-duplicate-content-in-the-search-engines
Greg
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How Does Google View Hidden Content?
I have a website which contains a lot of content behind a show hide, does Google crawl the "hidden" copy?
Web Design | | jasongmcmahon0 -
Curious why site isn't ranking, rather seems like being penalized for duplicate content but no issues via Google Webmaster...
So we have a site ThePowerBoard.com and it has some pretty impressive links pointing back to it. It is obviously optimized for the keyword "Powerboard", but in no way is it even in the top 10 pages of Google ranking. If you site:thepowerboard.com the site, and/or Google just the URL thepowerboard.com you will see that it populates in the search results. However if you quote search just the title of the home page, you will see oddly that the domain doesn't show up rather at the bottom of the results you will see where Google places "In order to show you the most relevant results, we have omitted some entries very similar to the 7 already displayed". If you click on the link below that, then the site shows up toward the bottom of those results. Is this the case of duplicate content? Also from the developer that built the site said the following: "The domain name is www.thepowerboard.com and it is on a shared server in a folder named thehoverboard.com. This has caused issues trying to ssh into the server which forces us to ssh into it via it’s ip address rather than by domain name. So I think it may also be causing your search bot indexing problem. Again, I am only speculating at this point. The folder name difference is the only thing different between this site and any other site that we have set up." (Would this be the culprit? Looking for some expert advice as it makes no sense to us why this domain isn't ranking?
Web Design | | izepper0 -
Lots of Listing Pages with Thin Content on Real Estate Web Site-Best to Set them to No-Index?
Greetings Moz Community: As a commercial real estate broker in Manhattan I run a web site with over 600 pages. Basically the pages are organized in the following categories: 1. Neighborhoods (Example:http://www.nyc-officespace-leader.com/neighborhoods/midtown-manhattan) 25 PAGES Low bounce rate 2. Types of Space (Example:http://www.nyc-officespace-leader.com/commercial-space/loft-space)
Web Design | | Kingalan1
15 PAGES Low bounce rate. 3. Blog (Example:http://www.nyc-officespace-leader.com/blog/how-long-does-leasing-process-take
30 PAGES Medium/high bounce rate 4. Services (Example:http://www.nyc-officespace-leader.com/brokerage-services/relocate-to-new-office-space) High bounce rate
3 PAGES 5. About Us (Example:http://www.nyc-officespace-leader.com/about-us/what-we-do
4 PAGES High bounce rate 6. Listings (Example:http://www.nyc-officespace-leader.com/listings/305-fifth-avenue-office-suite-1340sf)
300 PAGES High bounce rate (65%), thin content 7. Buildings (Example:http://www.nyc-officespace-leader.com/928-broadway
300 PAGES Very high bounce rate (exceeding 75%) Most of the listing pages do not have more than 100 words. My SEO firm is advising me to set them "No-Index, Follow". They believe the thin content could be hurting me. Is this an acceptable strategy? I am concerned that when Google detects 300 pages set to "No-Follow" they could interpret this as the site seeking to hide something and penalize us. Also, the building pages have a low click thru rate. Would it make sense to set them to "No-Follow" as well? Basically, would it increase authority in Google's eyes if we set pages that have thin content and/or low click thru rates to "No-Follow"? Any harm in doing this for about half the pages on the site? I might add that while I don't suffer from any manual penalty volume has gone down substantially in the last month. We upgraded the site in early June and somehow 175 pages were submitted to Google that should not have been indexed. A removal request has been made for those pages. Prior to that we were hit by Panda in April 2012 with search volume dropping from about 7,000 per month to 3,000 per month. Volume had increased back to 4,500 by April this year only to start tanking again. It was down to 3,600 in June. About 30 toxic links were removed in late April and a disavow file was submitted with Google in late April for removal of links from 80 toxic domains. Thanks in advance for your responses!! Alan0 -
Is it cloaking/hiding text if textual content is no longer accessible for mobile visitors on responsive webpages?
My company is implementing a responsive design for our website to better serve our mobile customers. However, when I reviewed the wireframes of the work our development company is doing, it became clear to me that, for many of our pages, large parts of the textual content on the page, and most of our sidebar links, would no longer be accessible to a visitor using a mobile device. The content will still be indexable, but hidden from users using media queries. There would be no access point for a user to view much of the content on the page that's making it rank. This is not my understanding of best practices around responsive design. My interpretation of Google's guidelines on responsive design is that all of the content is served to both users and search engines, but displayed in a more accessible way to a user depending on their mobile device. For example, Wikipedia pages have introductory content, but hide most of the detailed info in tabs. All of the information is still there and accessible to a user...but you don't have to scroll through as much to get to what you want. To me, what our development company is proposing fits the definition of cloaking and/or hiding text and links - we'd be making available different content to search engines than users, and it seems to me that there's considerable risk to their interpretation of responsive design. I'm wondering what other people in the Moz community think about this - and whether anyone out there has any experience to share about inaccessable content on responsive webpages, and the SEO impact of this. Thank you!
Web Design | | mmewdell0 -
Subdomains, duplicate content and microsites
I work for a website that generates a high amount of unique, quality content. This website though has had development issues with our web builder and they are going to separate the site into different subdomains upon launch. It's a scholarly site so the subdomains will be like history and science and stuff. Don't ask why aren't we aren't using subdirectories because trust me I wish we could. So we have to use subdomains and I'm wondering a couple questions. Will the duplication of coding, since all subdomains will have the same design and look, heavily penalize us and is there any way around that? Also if we generate a good amount of high quality content on each site could we link all those sites to our other site as a possible benefit for link building? And finally, would footer links, linking all the subdirectories, be a good thing to put in?
Web Design | | mdorville0 -
Duplicate H1 tag IF it holds SAME text?
Hello people, I know that majority of SEO gurus (?) claim that H1 tag should only be used once per page. In the landing page design I'm working with, we actually need to repeat our core message stated in H1 & H2 - at the bottom of the page. Now the question is: Can that in any way cause any ranking penalty from big G? In my eyes that is not attempt to over optimize page as it contains SAME info as the H1 & H2 at the top of the page. Confusing, so I'm hope that some SEO gurus here will share some light on this. Thanks in advance!
Web Design | | RetroOnline0 -
Duplicate home page /index.asp /index.php etc
We recently moved www.devoted2vntage.co.uk to shopify but seem to have multiple home page variants still in google index. I am concerned that these will be causing duplicate content. I have redirected the offending URLs below to www.devoted2vintage.co.uk/ and have set up a canonical URL but need an expect to tell me if I have taken the current steps and if not, exactly what I need to do. www.devoted2vintage.co.uk/index.php www.devoted2vintage.co.uk/index.htm www.devoted2vintage.co.uk/index.html www.devoted2vintage.co.uk/index.shtml www.devoted2vintage.co.uk/index.aspx www.devoted2vintage.co.uk/index.cfm www.devoted2vintage.co.uk/index.pl www.devoted2vintage.co.uk/index.asp
Web Design | | devoted2vintage0 -
Do drop caps impact the search value of your content?
A client of mine wants to include drop caps at the start of the first paragraph on the page because they think it looks nice. I found some css techniques for implementing this using a span on the first character to enlarge the size of just that character. First word of the first paragraph. Are there any seo concerns I should have for adding drop caps?
Web Design | | fivelinesmedia0