Duplicate content
-
I run about 10 sites and most of them seemed to fall foul of the penguin update and even though I have never sought inorganic links I have been frantically searching for a link based answer since April.
However since asking a question here I have been pointed in another direction by one of your contributors. It seems At least 6 of my sites have duplicate content issues.
If you search Google for "We have selected nearly 200 pictures of short haircuts and hair styles in 16 galleries" which is the first bit of text from the site short-hairstyles.com about 30000 results appear. I don't know where they're from nor why anyone would want to do this. I presume its automated since there is so much of it.
I have decided to redo the content. So I guess (hope) at some point in the future the duplicate nature will be flushed from Google's index?
But how do I prevent it happening again? It's impractical to redo the content every month or so.
For example if you search for "This facility is written in Flash to use it you need to have Flash installed." from another of my sites that I coincidently uploaded a new page to a couple of days ago, only the duplicate content shows up not my original site. So whoever is doing this is finding new stuff on my site and getting it indexed on google before even google sees it on my site!
Thanks,
Ian
-
I don't have any experience with Cloudflare so I can't offer an opinion on their services. And without a proper audit of your site and link profile, there is no honest way to know exactly what the core issues are on the site. Short of a proper audit, it's all a guess. That's the bigger concern.
Maybe it's links. Maybe its duplicate content perception. Maybe it's a dozen seemingly insignificant issues that accumulated to the breaking point with a trigger event like Penguin.
Unfortunately that's the reality of SEO in 2012.
-
ok, maybe I'm not getting something or not explaining myself properly.
When I say things like "30000 times", "every page" and "it is the majority of the content" in the context that I have in my head I'm saying its not a trivial thing and I have looked into it at length.
If you thought there was some verification needed to answer the question the information is there to have a look.
Complex things are made up of lots of uncomplex things.
How strong is this site? Up until April I'd say very strong, it came in at number 1 for several high volume keywords (still does in bing and yahoo)
As I said in the original question I have decided to redo most of the content on this site anyway so whether this whole issue is an issue or not isn't an issue.
The original question was how do you prevent it happening again? Is rel author rel-publisher and g+ the answer?
or what about this? http://www.cloudflare.com/plans
-
"it is the majority of my content". that's what I asked originally - if it is the majority of content on individual pages. If that's true, it could be a cause of problems, however SEO is an extremely complex process with multiple algorithms so unfortunately, without a detailed review of the site, it's dangerous to assume that specific issue is the cause of your problems.
How strong is your site in other regards? Do you implement rel-author or rel-publisher code and tie it to a Google+ account to communicate you're the original source? Do you have enough other trust signals in place? There are many other similar questions that need to be answered before anyone can confidently make serious recommendations.
-
1. Google doesn't seem to know this and has penalised my sites for something.
2. It is the majority of the content. Its pretty much all of it, upto 30000 times.
3. I've lost 70% of my traffic via recent Google updates. That is THE over whelming concern which is why I came and joined this site.
I arrived at this point by asking this question http://www.seomoz.org/q/penguin-issues if you disagree with the track I got sent on can you suggest a different one?
-
1. you're not generating the duplicate content so there's nothing you can logically do about on any kind of a scalable frequency, let alone prevent.
2. If it's not the majority of content on a page, it's not a serious problem. In fact, it's common to the internet.
3. Don't allow non-issues become an overwhelming concern. Focus on what you can do something about, and things that are more important and really do have a negative impact on your SEO that are within you control.
-
OK but the snippet is an exact match (in speech marks) and there's 30000 of them that's not just monkeys typing Shakespeare. Every page (300 or so) on that site has unique content and more or less each page has upto 30000 duplicates, most a lot less that 30000 but a lot more that 1, which it should be. If there was a couple of coincidences, fine, but there's not.
-
Just finding a snippet that's as short as the examples you gave is not a reason to be concerned about duplicate content in itself. A typical page should have hundreds of words and rank for whatever phrase or phrases you care about, not for a single sentence within the content.
If, on the other hand, you have the overwhelming majority of the content from one of your pages duplicated, that's a reason to be concerned.
So - how much content do you have on YOUR site on the page(s) in question? And have you checked to find out if the majority is duplicated? That's where the focus needs to be.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Manage category pages and duplicate content issues
Hi everybody, I am now auditing this website www.disfracessimon.com
Intermediate & Advanced SEO | | teconsite
this website has some issues with canonicals and other things. But right now I have found something that I would like to know your opinion. When I was checking parts of the content in google to find duplicate content issues I found this: I google I searched: "Chaleco de streck decorado con botones" and found First result: "Hombre trovador" is the one I was checking -> Correct
The following results are category pages where the product is listed in. I was wondering if this could cause any problem related with duplicated content. Should I no index category pages or should I keep it?
The first result in google was the product page. And category pages I think are good for link juice transfer and to capture some searchs from Google. Any advice? Thank you0 -
Duplicate content issue with pages that have navigation
We have a large consumer website with several sections that have navigation of several pages. How would I prevent the pages from getting duplicate content errors and how best would I handle SEO for these? For example we have about 500 events with 20 events showing on each page. What is the best way to prevent all the subsequent navigation pages from getting a duplicate content and duplicate title error?
Intermediate & Advanced SEO | | roundbrix0 -
Glossary index and individual pages create duplicate content. How much might this hurt me?
I've got a glossary on my site with an index page for each letter of the alphabet that has a definition. So the M section lists every definition (the whole definition). But each definition also has its own individual page (and we link to those pages internally so the user doesn't have to hunt down the entire M page). So I definitely have duplicate content ... 112 instances (112 terms). Maybe it's not so bad because each definition is just a short paragraph(?) How much does this hurt my potential ranking for each definition? How much does it hurt my site overall? Am I better off making the individual pages no-index? or canonicalizing them?
Intermediate & Advanced SEO | | LeadSEOlogist0 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
Duplicate Page Content Errors on Moz Crawl Report
Hi All, I seem to be losing a 'firefighting' battle with regards to various errors being reported on the Moz crawl report relating to; Duplicate Page Content Missing Page Title Missing Meta Duplicate Page Title While I acknowledge that some of the errors are valid (and we are working through them), I find some of them difficult to understand... Here is an example of a 'duplicate page content' error being reported; http://www.bolsovercruiseclub.com (which is obviously our homepage) Is reported to have 'duplicate page content' compared with the following pages; http://www.bolsovercruiseclub.com/guides/gratuities http://www.bolsovercruiseclub.com/cruise-deals/cruise-line-deals/holland-america-2014-offers/?order_by=brochure_lead_difference http://www.bolsovercruiseclub.com/about-us/meet-the-team/craig All 3 of those pages are completely different hence my confusion... This is just a solitary example, there are many more! I would be most interested to hear what people's opinions are... Many thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
Can a website be punished by panda if content scrapers have duplicated content?
I've noticed recently that a number of content scrapers are linking to one of our websites and have the duplicate content on their web pages. Can content scrapers affect the original website's ranking? I'm concerned that having duplicated content, even if hosted by scrapers, could be a bad signal to Google. What are the best ways to prevent this happening? I'd really appreciate any help as I can't find the answer online!
Intermediate & Advanced SEO | | RG_SEO0 -
Duplicate Content Error because of passed through variables
Hi everyone... When getting our weekly crawl of our site from SEOMoz, we are getting errors for duplicate content. We generate pages dynamically based on variables we carry through the URL's, like: http://www.example123.com/fun/life/1084.php
Intermediate & Advanced SEO | | CTSupp
http://www.example123.com/fun/life/1084.php?top=true ie, ?top=true is the variable being passed through. We are a large site (approx 7000 pages) so obviously we are getting many of these duplicate content errors in the SEOMoz report. Question: Are the search engines also penalizing for duplicate content based on variables being passed through? Thanks!0 -
Adding a huge new product range to eCommerce site and worried about Duplicate Content
Hey all, We currently run a large eCommerce site that has around 5000 pages of content and ranks quite strongly for a lot of key search terms. We have just recently finalised a business agreement to incorporate a new product line that compliments our existing catalogue, but I am concerned about dumping this huge amount of content (that is sourced via an API) onto our site and the effect it might have dragging us down for our existing type of product. In regards to the best way to handle it, we are looking at a few ideas and wondered what SEOMoz thought was the best. Some approaches we are tossing around include: making each page point to the original API the data comes from as the canonical source (not ideal as I don't want to pass link juice from our site to theirs) adding "noindex" to all the new pages so Google simply ignores them and hoping we get side sales onto our existing product instead of trying to rank as the new range is highly competitive (again not ideal as we would like to get whatever organic traffic we can) manually rewriting each and every new product page's descriptions, tags etc. (a huge undertaking in terms of working hours given it will be around 4,400 new items added to our catalogue). Currently the industry standard seems to just be to pull the text from the API and leave it, but doing exact text searches shows that there are literally hundreds of other sites using the exact same duplicate content... I would like to persuade higher management to invest the time into rewriting each individual page but it would be a huge task and be difficult to maintain as changes continually happen. Sorry for the wordy post but this is a big decision that potentially has drastic effects on our business as the vast majority of it is conducted online. Thanks in advance for any helpful replies!
Intermediate & Advanced SEO | | ExperienceOz0