How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hacked Websites (Doorways) Ranking First Page of Google
Hello Moz community! I could really use your help with some suggestions here with some recent changes I've noticed in the Google serps for terms I've been currently working on. Currently one of the projects I am working on is for an online pharmacy and noticed that the SERPs are being now taken up by hacked websites which look like doorways to 301 redirect to an online pharmacy the hacker wants the traffic to go to. Seems like they may be wordpress sites that are hacked and have unrelated content on their websites compared to online pharmacies. We've submitted these issues as spam to Google and within chrome as well but haven't heard back. When searching terms like "Canadian Pharmacy Viagra" and other similar terms we see this issue. Any other recommendations on how we can fix this issue? Thanks for your time and attached is a screenshot of the results we are seeing for one of our searches. 1Orus
White Hat / Black Hat SEO | | monarkg0 -
Why homepage is not getting cached by Google ?
It has been more than 2-3 months that I didn't notice that our website homepage is not getting cached by Google ?? i don't know why?? help me please, thanks in advance. Regards,
White Hat / Black Hat SEO | | spellblaster
Spel why.PNG0 -
Duplicate product content - from a manufacturer website, to retailers
Hi Mozzers, We're working on a website for a manufacturer who allows retailers to reuse their product information. Now, this of course raises the issue of duplicate content. The manufacturer is the content owner and originator, but retailers will copy the information for their own site and not link back (permitted by the manufacturer) - the only reference to the manufacturer will be the brand name citation on the retailer website. How would you deal with the duplicate content issues that this may cause. Especially considering the domain authority for a lot of the retailer websites is better than the manufacturer site? Thanks!!
White Hat / Black Hat SEO | | A_Q0 -
Exchange link from sites in same google account
Hi everyone, Anybody have experience when you have some websites which stored in Google Webmaster Tool and they exchange links between sites. So is it good for sites? We are hosted on different server. Thank you so much
White Hat / Black Hat SEO | | Jeepster0 -
Owning multiple domains across similar fields
We have a printing website specializing in a single field but had added other products to it along the way, we are looking at adding a second site (redesigned from the ground up) that with cover everything (so will inevitably cover some of the same keywords) is this ok. We are considering some of the other things such as putting both sites on different IP's and also wondered about the registration details (would it be ok to have them under the same company?). Just to make it clear they will not be used for cross link or any other such things so just wanting to know if they could end up on page one for some of the same keywords and Google would be ok with this. Thank you for your time.
White Hat / Black Hat SEO | | BobAnderson0 -
Google Penguin w/ Meta Keywords
It's getting really hard filtering through the Penguin articles flying around right now so excuse me if this has been addressed: I know that Google no longer uses the meta keywords as indicators (VERY old news). But I'm just wondering if they are starting to look at them as a bigger spam indicator since Penguin is looking at over-optimization. If yes, has anyone read good article indicating so? The reason I ask is because I have two websites, one is authoritative and the other… not so much. Recently my authoritative website has taken a dip in rankings, a significant dip. The non-authoritative one has increased in rankings… by a lot. Now, the authoritative website pages that use meta-keywords seem to be the ones that are having issues… so it really has me wondering. Both websites compete with each other and are fairly similar in their offerings. I should also mention that the meta-keywords were implemented a long time ago… before I took over the account. Also important to note, I never purchase links and never practice any spammy techniques. I am as white hat as it gets which has me really puzzled as to why one site dropped drastically.
White Hat / Black Hat SEO | | BeTheBoss0 -
Google Places
My client offers training from many locations within the UK. These locations/venues are not owned by them, however I see no problem in setting up a different listing for each location in Google Places. At the end of the day if a user searched for “Training London” they are looking for somewhere that they can book a course that would be in their local area. As my client has a “venue” there I think there is a good argument to say that your listing would be valid. What are your thoughts.
White Hat / Black Hat SEO | | cottamg0 -
How much time do you think Google employees spend reverse engineering what we do?
Lets face it, it's the corner stone of SEO, reverse engineering sites to guess at what big G does. It would just make sense they did the same to learn all our tactics.
White Hat / Black Hat SEO | | naffhampton1