How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long before our website bounce back after Google Penalty?
One of our client websites got recently hacked. In a span of 4 days, it received random backlinks from random websites with random anchor texts. We are already in good standing for some of the keywords we are tracking and the attack got us a penalty from Google and we lost our rankings, moving out of the top 500. We already disavowed these dirty backlinks though we never really diagnosed where these came from. How long do you think our client's website will bounce back from the penalty?
White Hat / Black Hat SEO | | SirAdri110 -
Why isn't a 301 redirect removing old style URLs from Google's index?
I have two questions:1 - We changed the URL structure of our site. Old URLs were in the format of kiwiforsale.com/used_fruit/yummy_kiwi. These URLs are 301 redirected to kiwiforsale.com/used-fruit/yummy-kiwi. We are getting duplicate content errors in Google Webmaster Tools. Why isn't the 301 redirect removing the old style URL out of Google's index?2 - I tried to remove the old style URL at https://www.google.com/webmasters/tools/removals, however I got the message that "We think the image or web page you're trying to remove hasn't been removed by the site owner. Before Google can remove it from our search results, the site owner needs to take down or update the content."Why are we getting this message? Doesn't the 301 redirect alert Google that the old style URL is toast and it's gone?
White Hat / Black Hat SEO | | CFSSEO0 -
Keywords in Google Local results
We have a client in the moving business and I'm absolutely flabbergasted by the "local" results and the number of them that are not following Google's guidelines for Google Local accounts. 3 of them are using exact match keyword strings as their company names. I've reported all 3, every week for the last 2 months and have not seen a single dip in the rankings. Meanwhile our client has a duplicate listing we've verified and "suspended" and it hasn't changed for 4 months! Any tips? I've attached a photo of the listings as well. xwWZWyT.gif
White Hat / Black Hat SEO | | SmartWebPros0 -
How to transform an excel file on a txt file to send the Google Dissavow
I have a disallow file made on excel with lots of columns off information. I want to transform to txt file saving it from excel, but the result file seems understandable Can someone helpme on how to transform an excel file on the Google Dissavow file format for the final import
White Hat / Black Hat SEO | | maestrosonrisas0 -
Do bad links "hurt" your ranking or just not add any value
Do bad links "hurt" your ranking or just not add any value. By this I mean, if you do have links from link farms and bad neighbourhoods, would it effectively pull you down in search engine rankings. Or is it more that it's just a waste of time to get these links, as it adds no value to your ranking. Are google saying avoid them because it will not have a positive effect, or avoid them becuase it will have a negative effect. I am under the opinion that it will not harm, but it will not help either. I think this because at the end of the day you are not 100% in control of your inbound links, any bad site could add you and if a competitor, god forbid, wanted to play some black hat games, couldn't they just add you to thousands of bad sites to pull your ranking down? Interested to hear your opinions on the matter, or any "facts" if they are out there.
White Hat / Black Hat SEO | | esendex0 -
Is domain name or page title "safe" as anchor text?
I am aware of the dangers of excessively optimized anchor text I have seen some suggestions that as long as your anchor text is either the URL or the page title that this will be OK, no matter how many links come in with that anchor text. Does anyone have an opinion, or even any hard data on this? Thx Paul
White Hat / Black Hat SEO | | diogenes0 -
Google rankings dropped like a stone
I've heard of this happening many times, but never to me. My client was Page 1 or 2 for 20 phrases, and they ALL dropped like a rock overnight. The site hasn't been banned by Google, as it's still indexed and the company name is returning results.There were no major changes done to tags or the code, and nothing black hat has been done. The only phrases that didn't drop contain the company name, and the results in Bing and Yahoo either stayed the same or moved up slightly since last week for all the terms. There's also no threat of spam, and it's very search engine friendly. The URL is http://www.universalaccounting.com. Help!
White Hat / Black Hat SEO | | JamesBSEO0