How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google and Other Search Engine crawl meta tags if we call it using react .js ?
We have a site which is having only one url and all other pages are its components. not different pages. Whichever pages we click it will open show that with react .js . Meta title and meta description also will change accordingly. Will it be good or bad for SEO for using this "react .js" ? Website: http://www.mantistechnologies.com/
White Hat / Black Hat SEO | | RobinJA0 -
Penguin: Is there a "safe threshold" for commercial links?
Hello everyone, Here I am with a question about Penguin. I am asking to all Penguin experts on these forums to help me understand if there is a "safe" threshold of unnatural links under which we can have peace of mind. I really have no idea about that, I am not an expert on Penguin nor an expert of unnatural back link profiles. I have a website with about 84% natural links and 16% affiliate/commercial links. Should I be concerned about possibly being penalized by an upcoming Penguin update? So far, I have never been hit by any previous Penguin released, but... just in case, you experts, do you know what's the "threshold" of unnatural links that shouldn't be exceeded? Or, in your experience, what's the classic threshold over which Google can penalize a website for unnatural back link profile? Thank you in advance to anyone helping me on this research!
White Hat / Black Hat SEO | | fablau0 -
Google Organic Ranking & Traffic Dropped
Hello, We have been struggling to keep our website (http://goo.gl/vS37qA) ranking well in Google since April 30, 2015. For some reason at that time, there were around 15000 blocked pages (mainly Magento layered navigation pages) showing in Google's Search Console. We used canonical tags, and now all these pages have been removed from Google's index and Google Search Console. We didn't do anything that is against Google's Guidelines. Currently in Google Search Console we see:- Around 50 crawl errors- no malware- no blocked pages - no other error messages in both Webmasters tool.We have never practiced black hat SEO, paid for links, or used tactics that Google penalizes. We noticed in the last few months there are around 1000 Chinese/Russian/Japanese links points to our website, and we have used the disavow tool to notify Google of these attacks.Any help would be greatly appreciated in advance!
White Hat / Black Hat SEO | | NancyH0 -
Duplicate content for product pages
Say you have two separate pages, each featuring a different product. They have so many common features, that their content is virtually duplicated when you get to the bullets to break it all down. To avoid a penalty, is it advised to paraphrase? It seems to me it would benefit the user to see it all laid out the same, apples to apples. Thanks. I've considered combining the products on one page, but will be examining the data to see if there's a lost benefit to not having separate pages. Ditto for just not indexing the one that I suspect may not have much traction (requesting data to see).
White Hat / Black Hat SEO | | SSFCU0 -
Negative SEO attack working amazingly on Google.ca
We have a client www.atvandtrailersales.com who recently (March) fell out of the rankings. We checked their backlink file and found over 100 spam links pointing at their website with terms like "uggboots" and "headwear" etc. etc. I submitted a disavow link file, as this was obviously an attack on the website. Since the recent Panda update, the client is back out of the rankings for a majority of keyword phrases. The disavow link file that was submitted back in march has 90% of the same links that are still spamming the website now. I've sent a spam report to Google and nothing has happened. I could submit a new disavow link file, but I'm not sure if this is worth the time. '.'< --Thanks!
White Hat / Black Hat SEO | | SmartWebPros1 -
I think my site is affected by a Google glitch...or something
Although google told me No manual spam actions found i had not received an unnatural link request notice i figured it would be a good idea to clean these up so i did. So i have submitted 3 reconsideration requests from google. They all came back with the same response: No manual spam actions found. I really doubt that anyone at google really checked those out.You will notice that i don't even appear on page 1-10 at all...its clearly google filtering the site out from the results(except for my brand terms), but i have no idea what for.What do you guys think it is? If you see anythign let me know so i can have it fixed.This has been going on for 2 months now...my company has been around for a long time...i dont understand why suddenly im not showing up in searches for the keyword si used to rank for...
White Hat / Black Hat SEO | | CMTM0 -
Same template site same products but different content?
for the sake of this post I am selling lighters. I have 3 domains small-lighters.com medium-lighter.com large-lighters.com On all of the websites I have the same template same images etc and same products. The only difference is the way the content is worded described etc different bullet points. My domains are all strong keyword domains not spammy and bring in type in traffic. Is it ok to continue in this manner in your opinion?
White Hat / Black Hat SEO | | dynamic080 -
Do bad links "hurt" your ranking or just not add any value
Do bad links "hurt" your ranking or just not add any value. By this I mean, if you do have links from link farms and bad neighbourhoods, would it effectively pull you down in search engine rankings. Or is it more that it's just a waste of time to get these links, as it adds no value to your ranking. Are google saying avoid them because it will not have a positive effect, or avoid them becuase it will have a negative effect. I am under the opinion that it will not harm, but it will not help either. I think this because at the end of the day you are not 100% in control of your inbound links, any bad site could add you and if a competitor, god forbid, wanted to play some black hat games, couldn't they just add you to thousands of bad sites to pull your ranking down? Interested to hear your opinions on the matter, or any "facts" if they are out there.
White Hat / Black Hat SEO | | esendex0