How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Organic Ranking & Traffic Dropped
Hello, We have been struggling to keep our website (http://goo.gl/vS37qA) ranking well in Google since April 30, 2015. For some reason at that time, there were around 15000 blocked pages (mainly Magento layered navigation pages) showing in Google's Search Console. We used canonical tags, and now all these pages have been removed from Google's index and Google Search Console. We didn't do anything that is against Google's Guidelines. Currently in Google Search Console we see:- Around 50 crawl errors- no malware- no blocked pages - no other error messages in both Webmasters tool.We have never practiced black hat SEO, paid for links, or used tactics that Google penalizes. We noticed in the last few months there are around 1000 Chinese/Russian/Japanese links points to our website, and we have used the disavow tool to notify Google of these attacks.Any help would be greatly appreciated in advance!
White Hat / Black Hat SEO | | NancyH0 -
Preventing CNAME Site Duplications
Hello fellow mozzers! Let me see if I can explain this properly. First, our server admin is out of contact at the moment,
White Hat / Black Hat SEO | | David-Kley
so we are having to take this project on somewhat blind. (forgive the ignorance of terms). We have a client that needs a cname record setup, as they need a sales.DOMAIN.com to go to a different
provider of data. They have a "store" platform that is hosted elsewhere and they require a cname to be
sent to a custom subdomain they set up on their end. My question is, how do we prevent the cname from being indexed along with the main domain? If we
process a redirect for the subdomain, then the site will not be able to go out and grab the other providers
info and display it. Currently, if you type in the sales.DOMAIN.com it shows the main site's homepage.
That cannot be allow to take place as we all know, having more than one domain with
exact same content = very bad for seo. I'd rather not rely on Google to figure it out. Should we just have the cname host (where its pointing at) add a robots rule and have it set to not index
the cname? The store does not need to be indexed, as the items are changed almost daily. Lastly, is an A record required for this type of situation in any way? Forgive my ignorance of subdomains, cname records and related terms. Our server admin being
unavailable is not helping this project move along any. Any advice on the best way to handle
this would be very helpful!0 -
How to re-rank an established website with new content
I can't help but feel this is a somewhat untapped resource with a distinct lack of information.
White Hat / Black Hat SEO | | ChimplyWebGroup
There is a massive amount of information around on how to rank a new website, or techniques in order to increase SEO effectiveness, but to rank a whole new set of pages or indeed to 're-build' a site that may have suffered an algorithmic penalty is a harder nut to crack in terms of information and resources. To start I'll provide my situation; SuperTED is an entertainment directory SEO project.
It seems likely we may have suffered an algorithmic penalty at some point around Penguin 2.0 (May 22nd) as traffic dropped steadily since then, but wasn't too aggressive really. Then to coincide with the newest Panda 27 (According to Moz) in late September this year we decided it was time to re-assess tactics to keep in line with Google's guidelines over the two years. We've slowly built a natural link-profile over this time but it's likely thin content was also an issue. So beginning of September up to end of October we took these steps; Contacted webmasters (and unfortunately there was some 'paid' link-building before I arrived) to remove links 'Disavowed' the rest of the unnatural links that we couldn't have removed manually. Worked on pagespeed as per Google guidelines until we received high-scores in the majority of 'speed testing' tools (e.g WebPageTest) Redesigned the entire site with speed, simplicity and accessibility in mind. Htaccessed 'fancy' URLs to remove file extensions and simplify the link structure. Completely removed two or three pages that were quite clearly just trying to 'trick' Google. Think a large page of links that simply said 'Entertainers in London', 'Entertainers in Scotland', etc. 404'ed, asked for URL removal via WMT, thinking of 410'ing? Added new content and pages that seem to follow Google's guidelines as far as I can tell, e.g;
Main Category Page Sub-category Pages Started to build new links to our now 'content-driven' pages naturally by asking our members to link to us via their personal profiles. We offered a reward system internally for this so we've seen a fairly good turnout. Many other 'possible' ranking factors; such as adding Schema data, optimising for mobile devices as best we can, added a blog and began to blog original content, utilise and expand our social media reach, custom 404 pages, removed duplicate content, utilised Moz and much more. It's been a fairly exhaustive process but we were happy to do so to be within Google guidelines. Unfortunately, some of those link-wheel pages mentioned previously were the only pages driving organic traffic, so once we were rid of these traffic has dropped to not even 10% of what it was previously. Equally with the changes (htaccess) to the link structure and the creation of brand new pages, we've lost many of the pages that previously held Page Authority.
We've 301'ed those pages that have been 'replaced' with much better content and a different URL structure - http://www.superted.com/profiles.php/bands-musicians/wedding-bands to simply http://www.superted.com/profiles.php/wedding-bands, for example. Therefore, with the loss of the 'spammy' pages and the creation of brand new 'content-driven' pages, we've probably lost up to 75% of the old website, including those that were driving any traffic at all (even with potential thin-content algorithmic penalties). Because of the loss of entire pages, the changes of URLs and the rest discussed above, it's likely the site looks very new and probably very updated in a short period of time. What I need to work out is a campaign to drive traffic to the 'new' site.
We're naturally building links through our own customerbase, so they will likely be seen as quality, natural link-building.
Perhaps the sudden occurrence of a large amount of 404's and 'lost' pages are affecting us?
Perhaps we're yet to really be indexed properly, but it has been almost a month since most of the changes are made and we'd often be re-indexed 3 or 4 times a week previous to the changes.
Our events page is the only one without the new design left to update, could this be affecting us? It potentially may look like two sites in one.
Perhaps we need to wait until the next Google 'link' update to feel the benefits of our link audit.
Perhaps simply getting rid of many of the 'spammy' links has done us no favours - I should point out we've never been issued with a manual penalty. Was I perhaps too hasty in following the rules? Would appreciate some professional opinion or from anyone who may have experience with a similar process before. It does seem fairly odd that following guidelines and general white-hat SEO advice could cripple a domain, especially one with age (10 years+ the domain has been established) and relatively good domain authority within the industry. Many, many thanks in advance. Ryan.0 -
SEO from INDIA Smarter then Google?
Exploring this url: filterscanada.ca In Open Site Explorer, it is clear those guys bought one of those package available on site like eLance.com a low price over seas!!! Where freelancer around the word can be hired for a fews bucks. For example, I post a job on eLance for SEO and most of the freelancer submitting where from INDIA. For just a few hundreds dollars, you can get a complet SEO package. At first, the price was attractive, but when posting on seoMoz and doing research, I came to the conclusion, the techniques they use might hurt more the produce positive result... How can you get a D.A. of 45 using backlinks they get? I read all those things about Google algorithm, and Panda and Penguin and this and that... Being impossible to crack! Do you have a explanation? I work really hard ans spends lots of $$$ to have a clean site selling furnace filters
White Hat / Black Hat SEO | | BigBlaze205
I follow all the SEO guide lines, practice only white hat trying to built somethings, but with a P.A. of 19 and a competitors ranking like this, I ask myself: "Maybe INDIA is smarter then Google and I should do like this site, spend a couple of hundreds dollars and buy myself a high D.A."0 -
Links via scraped / cloned content
Just been looking at some backlinks on a site - a good proportion of them are via Scraped wikipedia links or sites with similar directories to those found on DMOZ (just they have different names). To be honest, many of these sites look pretty dodgy to me, but if they're doing illegal stuff there's absolutely no way I'll be able to get links removed. Should I just sit and watch the backlinks increase from these questionable sources, or report the sites to Google, or do something else? Advice please.
White Hat / Black Hat SEO | | McTaggart0 -
Possibly a dumb question - 301 from a banned domain to new domain with NEW content
I was wondering if banned domains pass any page rank, link love, etc. My domain got banned and I AM working to get it unbanned, but in the mean time, would buying a new domain, and creating NEW content that DOES adhere to the google quality guidelines, help at all? Would this force an 'auto-evaluation' or 're-evaluation' of the site by google? or would the new domain simply have ZERO effect from the 301 unless that old domain got into google's good graces again.
White Hat / Black Hat SEO | | ilyaelbert0 -
Has anyone seen this kind of google cache spam before?
Has anyone seen this kind of 'hack'? When looking at a site recently I found the Google cache version (from 28 Oct) strewn with mentions of all sorts of dodgy looking pharma products but the site itself looked fine. The site itself is www.istc.org.uk Looking in the source of the pages you can see the home pages contains: Browsing as googlebot showed me an empty page (though msnbot etc. returned a 'normal' non-pharma page). As a mildly amusing aside - when I tried to tell the istc about this, the person answering the phone clearly didn't believe me and couldn't get me off the line fast enough! Needless to say they haven't fixed it a week after being told.
White Hat / Black Hat SEO | | JaspalX0 -
Why is this ranking first in Google Places for this term....?
"Best Bar In Chicago" - http://www.google.com/search?gcx=w&sourceid=chrome&ie=UTF-8&q=best+bar+in+chicago They have only 5 Google reviews, and their local directory reviews are suspect. One of them goes to rateclubs.com and it's not even a page for their business, while one of them doesn't have user reviews, it's just an editorial review. The other one at superpages.com doesn't even link back to their site, it links to their restaurants.com profile. What is going on here? I've been trying to figure this out for a while as their first place ranking has been solidified for quite some time now. I can also tell you that a few of the bars listed below them have a MUCH higher profile and are better known. You can see that just by the reviews.
White Hat / Black Hat SEO | | MichaelWeisbaum0