How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do content copycats (plagiarism) hurt original website rankings?
Hi all, Found some websites stolen our content and using the same sentences in their website pages. Does this content hurt our website rankings? Their DA is low, still we are worried about the damage about this plagiarism. Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Does Google checks the author name of the articles with backlinks to a website?
Hi, This may sound a little too suspicious; but just want to take your suggestions and experience in this. We are trying to create articles on third party websites to increase backlinks, our brand popularity and awareness about our features. If the same author is mentioned in multiple or tens of articles with backlinks to same website; will Google monitor the author name? Is there anything wrong in creating too many external articles with same author name? Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Without prerender.io, is google able to render & index geographical dynamic content?
One section of our website is built as a single page application and serves dynamic content based on geographical location. Before I got here, we had used prerender.io so google can see the page, but now that prerender.io is gone, is google able to render & index geographical dynamic content? I'm assuming no. If no is the answer, what are some solutions other than converting everything to html (would be a huge overhaul)?
White Hat / Black Hat SEO | | imjonny1231 -
Does Google and Other Search Engine crawl meta tags if we call it using react .js ?
We have a site which is having only one url and all other pages are its components. not different pages. Whichever pages we click it will open show that with react .js . Meta title and meta description also will change accordingly. Will it be good or bad for SEO for using this "react .js" ? Website: http://www.mantistechnologies.com/
White Hat / Black Hat SEO | | RobinJA0 -
Competitor ranking well with duplicate content—what are my options?
A competitor is ranking #1 and #3 for a search term (see attached) by publishing two separate sites with the same content. They've modified the title of the page, and serve it in a different design, but are using their branded domain and a keyword-rich domain to gain multiple rankings. This has been going on for years, and I've always told myself that Google would eventually catch it with an algorithm update, but that doesn't seem to be happening. Does anyone know of other options? It doesn't seem like this falls under any of the categories that Google lists on their web spam report page—is there any other way to get bring this up with the powers that be, or is it something that I just have to live with and hope that Google figures out some day? Any advice would help. Thanks! how_to_become_a_home_inspector_-_Google_Search_2015-01-15_18-45-06.jpg
White Hat / Black Hat SEO | | inxilpro0 -
How will Google deal with the crosslinks for my multiple domain site
Hi, I can't find any good answer to this question so I thought, why not ask Moz.com ;-)! I have a site, let's call it webshop.xx For a few languages/markets, Deutsch, Dutch & Belgian, English, French. I use a different TLD with a different IP for each of these languages, so I'll end up with: webshop.de, webshop.nl, webshop.be, webshop.co.uk, webshop.com & webshop.fr They all link to eachother and every subpage that is translated from the other site gets a link as well from the other languages, so: webshop.com/stuff links to webshop.de/stuff My main website, webshop.com gets links from every other of these domain which Open Site Explorer as well as Majestic SEO sees as an external link. (this is happening) My question. How will Google deal in the long-run with the crosslinks coming from these domains? some guesses I made: I get full external links juice (content is translated so unique?) I get a bit of the juice of an external link They are actually seen as internal links I'll get a penalty Thanks in advance guys!!!
White Hat / Black Hat SEO | | pimarketing0 -
Competitor owns two domains which are essentially duplicates. Is this allowed?
Hello everyone,One of my competitors has two E-commerce sites that are almost exactly the same. The company re-branded a few years ago (changed the company name, changed the domain name) but kept the first domain live which is still fairly successful. Their re-branded website is a Top 1000 retailer.The thing is, both websites are essentially the EXACT SAME. They have the same products (with the same item #'s), the same pricing, the same copy and product descriptions, the same contact info, same layout, etc. The internal search bar on the first domain even redirects to their current site! The only real difference are the brand names. Currently, both sites are ranking very well for some very competitive keywords. For the past two years, I kept waiting for Google to penalize one (or both) of them for duplication. But for some reason Google seems to have not noticed. **Is there any way to "show google" site duplication they might be missing?**Thanks!
White Hat / Black Hat SEO | | bpharris90141 -
Help required as difficulty removing Google algorithmic penalty
I am not an SEO expert but I am trying to recover my company's ranking on Google. We are a UK based baby shower company. Been established since 2003. We have used SEO companies a few years ago. On September 28th 2012 our rankings in Google dropped significantly on certain landing pages, others like our baby shower gifts page has remained position 1 for UK Google searches . Bing and Yahoo were unaffected. Searches for baby shower and baby shower decorations has gone from position 1 or 2 (behind wikipedia ) to these 2 landing pages being unranked in Google. I have for the first time ever gone through our back links, tried to locate bad or low quality links, emailed where possible, and set up in webmaster tools a dissavow file ( currently not acted upon by Google). I have also amended the text in the baby shower department so it does not read as keyword stuffed. It has been two and a half months now and sales has dropped significantly and me and the staff are getting very concerned. Our site is www.showermybaby.co.uk . We have not received a manual penalty. Any suggestions or help in removing this Google penalty would be greatly appreciated.
White Hat / Black Hat SEO | | postagestamp0