How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with internal spam url's google indexed?
I am in SEO for years but never met this problem. I have client who's web page was hacked and there was posted many, hundreds of links, These links has been indexed by google. Actually these links are not in comments but normal external urls's. See picture. What is the best way to remove them? use google disavow tool or just redirect them to some page? The web page is new, but ranks good on google and has domain authority 24. I think that these spam url's improved rankings too 🙂 What would be the best strategy to solve this. Thanks. k9Bviox
White Hat / Black Hat SEO | | AndrisZigurs0 -
Duplicate content warning: Same page but different urls???
Hi guys i have a friend of mine who has a site i noticed once tested with moz that there are 80 duplicate content warnings, for instance Page 1 is http://yourdigitalfile.com/signing-documents.html the warning page is http://www.yourdigitalfile.com/signing-documents.html another example Page 1 http://www.yourdigitalfile.com/ same second page http://yourdigitalfile.com i noticed that the whole website is like the nealry every page has another version in a different url?, any ideas why they dev would do this, also the pages that have received the warnings are not redirected to the newer pages you can go to either one??? thanks very much
White Hat / Black Hat SEO | | ydf0 -
How cloudflare might affect "rank juice" on numerous domains due to limited IP range?
We have implemented quite a few large websites onto cloudflare and have been very happy with our results. Since this has been successful so far, we have been considering putting some other companies on CL as well, but have some concerns due to the structure of their business and related websites. The companies run multiple networks of technology, review, news, and informational websites. All have good content (Almost all unique to each website) and rankings currently, but if implemented to cloudflare, would be sharing DNS and most likely IP's with eachother. Raising a concern of google reducing their link juice because it would be detected as if it was coming from the same server, such as people used to do for their blog farms. For example, they might be tasked to write an article on XYZ company's new product. A unique article would be generated for 5-10 websites, all with unique, informative, valid and relevant content to each domain; Including links, be it direct or contextual, to the XYZ product or website URL. To clarify, so there is no confusion...each article is relevant to its website... technology website- artciel about the engineering of xyz product
White Hat / Black Hat SEO | | MNoisy
business website - How xyz product is affecting the market or stock price
howto website - How the xyz product is properly used Currently all sites are on different IP's and servers due to their size, but if routed through cloudflare, will Google simply detect this as duplicate linking efforts or some type of "black hat" effort since its coming from cloudflare? If yes, is there a way to prevent this while still using CL?
If no, why and how is this different than someone doing this to trick google? Thank you in advance! I look forward to some informative answers.0 -
Competitor using "unatural inbound links" not penalized??!
Since Google's latest updates, I think it would be safe to say that building links is harder. But i also read that Google applies their latest guidelines retro-actively. In other words, if you have built your ilnking profile on a lot of unnatural links, with spammy anchor text, you will get noticed and penalized. In the past, I used to use SEO friendly directories and "suggest URL's" to build back links, with keyword/phrase anchor text. But I thought that this technique was frowned upon by Google these days. So, what is safe to do? Why is Google not penalizing the competitor? And bottom line what is considered to be "unnatural link building" ?
White Hat / Black Hat SEO | | bjs20101 -
Are duplicate item titles harmful to my ecommerce site?
Hello everyone, I have an online shopping site selling, amongst other items, candles. We have lots of different categories within the LED candles category. One route a customer can take is homepage > LED candles > Tealights. Within the tealights category we have 7 different products which vary only in colour. It is necessary to create separate products for each colour since we have fantastic images for each colour. To target different keywords, at present we have different titles (hence different link texts, different URLs and different H1 tags) for each colour, for example "Battery operated LED candles, amber", "Flameless candles, red" and "LED tealights, blue". I was wondering if different titles to target different keywords is a good idea. Or, is it just confusing to the customer and should I just stick with a generic item title which just varies by colour (eg. "LED battery candles, colour")? If I do the latter, am I at risk of getting downranked by Google since I am duplicating the product titles/link texts/URLs/H1 tags/img ALTs? (the description and photos for each colour are unique). Sorry if this is a little complicated - please ask and I can clarify anything...because I really want to give the best customer experience but still preserve my Google ranking. I have attached screenshots of the homepage and categories to clarify, feel free to go on the site live too. Thank you so much, Pravin BqFCp.jpg KC2wB.jpg BEcfX.jpg
White Hat / Black Hat SEO | | goforgreen0 -
Shadow Pages for Flash Content
Hello. I am curious to better understand what I've been told are "shadow pages" for Flash experiences. So for example, go here:
White Hat / Black Hat SEO | | mozcrush
http://instoresnow.walmart.com/Kraft.aspx#/home View the page as Googlebot and you'll see an HTML page. It is completely different than the Flash page. 1. Is this ok?
2. If I make my shadow page mirror the Flash page, can I put links in it that lead the user to the same places that the Flash experience does?
3. Can I put "Pinterest" Pin-able images in my shadow page?
3. Can a create a shadow page for a video that has the transcript in it? Is this the same as closed captioning? Thanks so much in advance, -GoogleCrush0 -
Possibly a dumb question - 301 from a banned domain to new domain with NEW content
I was wondering if banned domains pass any page rank, link love, etc. My domain got banned and I AM working to get it unbanned, but in the mean time, would buying a new domain, and creating NEW content that DOES adhere to the google quality guidelines, help at all? Would this force an 'auto-evaluation' or 're-evaluation' of the site by google? or would the new domain simply have ZERO effect from the 301 unless that old domain got into google's good graces again.
White Hat / Black Hat SEO | | ilyaelbert0