Duplicate content
-
I run about 10 sites and most of them seemed to fall foul of the penguin update and even though I have never sought inorganic links I have been frantically searching for a link based answer since April.
However since asking a question here I have been pointed in another direction by one of your contributors. It seems At least 6 of my sites have duplicate content issues.
If you search Google for "We have selected nearly 200 pictures of short haircuts and hair styles in 16 galleries" which is the first bit of text from the site short-hairstyles.com about 30000 results appear. I don't know where they're from nor why anyone would want to do this. I presume its automated since there is so much of it.
I have decided to redo the content. So I guess (hope) at some point in the future the duplicate nature will be flushed from Google's index?
But how do I prevent it happening again? It's impractical to redo the content every month or so.
For example if you search for "This facility is written in Flash to use it you need to have Flash installed." from another of my sites that I coincidently uploaded a new page to a couple of days ago, only the duplicate content shows up not my original site. So whoever is doing this is finding new stuff on my site and getting it indexed on google before even google sees it on my site!
Thanks,
Ian
-
I don't have any experience with Cloudflare so I can't offer an opinion on their services. And without a proper audit of your site and link profile, there is no honest way to know exactly what the core issues are on the site. Short of a proper audit, it's all a guess. That's the bigger concern.
Maybe it's links. Maybe its duplicate content perception. Maybe it's a dozen seemingly insignificant issues that accumulated to the breaking point with a trigger event like Penguin.
Unfortunately that's the reality of SEO in 2012.
-
ok, maybe I'm not getting something or not explaining myself properly.
When I say things like "30000 times", "every page" and "it is the majority of the content" in the context that I have in my head I'm saying its not a trivial thing and I have looked into it at length.
If you thought there was some verification needed to answer the question the information is there to have a look.
Complex things are made up of lots of uncomplex things.
How strong is this site? Up until April I'd say very strong, it came in at number 1 for several high volume keywords (still does in bing and yahoo)
As I said in the original question I have decided to redo most of the content on this site anyway so whether this whole issue is an issue or not isn't an issue.
The original question was how do you prevent it happening again? Is rel author rel-publisher and g+ the answer?
or what about this? http://www.cloudflare.com/plans
-
"it is the majority of my content". that's what I asked originally - if it is the majority of content on individual pages. If that's true, it could be a cause of problems, however SEO is an extremely complex process with multiple algorithms so unfortunately, without a detailed review of the site, it's dangerous to assume that specific issue is the cause of your problems.
How strong is your site in other regards? Do you implement rel-author or rel-publisher code and tie it to a Google+ account to communicate you're the original source? Do you have enough other trust signals in place? There are many other similar questions that need to be answered before anyone can confidently make serious recommendations.
-
1. Google doesn't seem to know this and has penalised my sites for something.
2. It is the majority of the content. Its pretty much all of it, upto 30000 times.
3. I've lost 70% of my traffic via recent Google updates. That is THE over whelming concern which is why I came and joined this site.
I arrived at this point by asking this question http://www.seomoz.org/q/penguin-issues if you disagree with the track I got sent on can you suggest a different one?
-
1. you're not generating the duplicate content so there's nothing you can logically do about on any kind of a scalable frequency, let alone prevent.
2. If it's not the majority of content on a page, it's not a serious problem. In fact, it's common to the internet.
3. Don't allow non-issues become an overwhelming concern. Focus on what you can do something about, and things that are more important and really do have a negative impact on your SEO that are within you control.
-
OK but the snippet is an exact match (in speech marks) and there's 30000 of them that's not just monkeys typing Shakespeare. Every page (300 or so) on that site has unique content and more or less each page has upto 30000 duplicates, most a lot less that 30000 but a lot more that 1, which it should be. If there was a couple of coincidences, fine, but there's not.
-
Just finding a snippet that's as short as the examples you gave is not a reason to be concerned about duplicate content in itself. A typical page should have hundreds of words and rank for whatever phrase or phrases you care about, not for a single sentence within the content.
If, on the other hand, you have the overwhelming majority of the content from one of your pages duplicated, that's a reason to be concerned.
So - how much content do you have on YOUR site on the page(s) in question? And have you checked to find out if the majority is duplicated? That's where the focus needs to be.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Consolidating two different domains to point at same site, duplicate content penalty?
I have two websites that are extremely similar and want to consolidate them into one website by pointing both domain names at one website. is this going to cause any duplicate content penalties by having two different domain names pointing at the same site? Both domains get traffic so i don't want to just discontinue one of the domains.
Intermediate & Advanced SEO | | Ron100 -
Duplicate Multi-site Content, Duplicate URLs
We have 2 ecommerce sites that are 95% identical. Both sites carry the same 2000 products, and for the most part, have the identical product descriptions. They both have a lot of branded search, and a considerable amount of domain authority. We are in the process of changing out product descriptions so that they are unique. Certain categories of products rank better on one site than another. When we've deployed unique product descriptions on both sites, we've been able to get some double listings on Page 1 of the SERPs. The categories on the sites have different names, and our URL structure is www.domain.com/category-name/sub-category-name/product-name.cfm. So even though the product names are the same, the URLs are different including the category names. We are in the process of flattening our URL structures, eliminating the category and subcategory names from the product URLs: www.domain.com/product-name.cfm. The upshot is that the product URLs will be the same. Is that going to cause us any ranking issues?
Intermediate & Advanced SEO | | AMHC0 -
Duplicate Content / Canonical Conundrum on E-Commerce Website
Hi all, I’m looking for some expert advice on use of canonicals to resolve duplicate content for an e-Commerce site. I’ve used a generic example to explain the problem (I do not really run a candy shop). SCENARIO I run a candy shop website that sells candy dispensers and the candy that goes in them. I sell about 5,000 different models of candy dispensers and 10,000 different types of candy. Much of the candy fits in more than one candy dispenser, and some candy dispensers fit exactly the same types of candy as others. To make things easy for customers who need to fill up their candy dispensers, I provide a “candy finder” tool on my website which takes them through three steps: 1. Pick your candy dispenser brand (e.g. Haribo) 2. Pick your candy dispenser type (e.g. soft candy or hard candy) 3. Pick your candy dispenser model (e.g. S4000-A) RESULT: The customer is then presented with a list of candy products that they can buy. on a URL like this: Candy-shop.com/haribo/soft-candy/S4000-A All of these steps are presented as HTML pages with followable/indexable links. PROBLEM: There is a duplicate content issue with the results pages. This is because a lot of the candy dispensers fit exactly the same candy (e.g. S4000-A, S4000-B and S4000-C). This means that the content on these pages are the basically same because the same candy products are listed. I’ll call these the “duplicate dispensers” E.g. Candy-shop.com/haribo/soft-candy/S4000-A Candy-shop.com/haribo/soft-candy/S4000-B Candy-shop.com/haribo/soft-candy/S4000-C The page titles/headings change based on the dispenser model, but that’s not enough for the pages to be deemed unique by Moz. I want to drive organic traffic searches for the dispenser model candy keywords, but with duplicate content like this I’m guessing this is holding me back from any of these dispenser pages ranking. SOLUTIONS 1. Write unique content for each of the duplicate dispenser pages: Manufacturers add or discontinue about 500 dispenser models each quarter and I don’t have the resources to keep on top of this content. I would also question the real value of this content to a user when it’s pretty obvious what the products on the page are. 2. Pick one duplicate dispenser to act as a rel=canonical and point all its duplicates at it. This doesn’t work as dispensers get discontinued so I run the risk of randomly losing my canonicals or them changing as models become unavailable. 3. Create a single page with all of the duplicate dispensers on, and canonical all of the individual duplicate pages to that page. e.g. Canonical: candy-shop.com/haribo/soft-candy/S4000-Series Duplicates (which all point to canonical): candy-shop.com/haribo/soft-candy/S4000-Series?model=A candy-shop.com/haribo/soft-candy/S4000-Series?model=B candy-shop.com/haribo/soft-candy/S4000-Series?model=C PROPOSED SOLUTION Option 3. Anyone agree/disagree or have any other thoughts on how to solve this problem? Thanks for reading.
Intermediate & Advanced SEO | | webmethod0 -
Duplicate
Is it harmful to have two of these which are identical in the section?
Intermediate & Advanced SEO | | Sika220 -
Why are these pages considered duplicate content?
I have a duplicate content warning in our PRO account (well several really) but I can't figure out WHY these pages are considered duplicate content. They have different H1 headers, different sidebar links, and while a couple are relatively scant as far as content (so I might believe those could be seen as duplicate), the others seem to have a substantial amount of content that is different. It is a little perplexing. Can anyone help me figure this out? Here are some of the pages that are showing as duplicate: http://www.downpour.com/catalogsearch/advanced/byNarrator/narrator/Seth+Green/?bioid=5554 http://www.downpour.com/catalogsearch/advanced/byAuthor/author/Solomon+Northup/?bioid=11758 http://www.downpour.com/catalogsearch/advanced/byNarrator/?mediatype=audio+books&bioid=3665 http://www.downpour.com/catalogsearch/advanced/byAuthor/author/Marcus+Rediker/?bioid=10145 http://www.downpour.com/catalogsearch/advanced/byNarrator/narrator/Robin+Miles/?bioid=2075
Intermediate & Advanced SEO | | DownPour0 -
Virtual Domains and Duplicate Content
So I work for an organization that uses virtual domains. Basically, we have all our sites on one domain and then these sites can also be shown at a different URL. Example: sub.agencysite.com/store sub.brandsite.com/store Now the problem comes up often when we move the site to a brand's URL versus hosting the site on our URL, we end up with duplicate content. Now for god knows what damn reason, I currently cannot get my dev team to implement 301's but they will implement 302's. (Dont ask) I also am left with not being able to change the robots.txt file for our site. They say if we allowed people to go in a change this stuff it would be too messy and somebody would accidentally block a site that was not supposed to be blocked on our domain. (We are apparently incapable toddlers) Now I have an old site, sub.agencysite.com/store ranking for my terms while the new site is not showing up. So I am left with this question: If I want to get the new site ranking what is the best methodology? I am thinking of doing a 1:1 mapping of all pages and set up 302 redirects from the old to the new and then making the canonical tags on the old to reflect the new. My only thing here is how will Google actually view this setup? I mean on one hand I am saying
Intermediate & Advanced SEO | | DRSearchEngOpt
"Hey, Googs, this is just a temp thing." and on the other I am saying "Hey, Googs, give all the weight to this page, got it? Graci!" So with my limited abilities, can anybody provide me a best case scenario?0 -
How to compete with duplicate content in post panda world?
I want to fix duplicate content issues over my eCommerce website. I have read very valuable blog post on SEOmoz regarding duplicate content in post panda world and applied all strategy to my website. I want to give one example to know more about it. http://www.vistastores.com/outdoor-umbrellas Non WWW version: http://vistastores.com/outdoor-umbrellas redirect to home page. For HTTPS pages: https://www.vistastores.com/outdoor-umbrellas I have created Robots.txt file for all HTTPS pages as follow. https://www.vistastores.com/robots.txt And, set Rel=canonical to HTTP page as follow. http://www.vistastores.com/outdoor-umbrellas Narrow by search: My website have narrow by search and contain pages with same Meta info as follow. http://www.vistastores.com/outdoor-umbrellas?cat=7 http://www.vistastores.com/outdoor-umbrellas?manufacturer=Bond+MFG http://www.vistastores.com/outdoor-umbrellas?finish_search=Aluminum I have restricted all dynamic pages by Robots.txt which are generated by narrow by search. http://www.vistastores.com/robots.txt And, I have set Rel=Canonical to base URL on each dynamic pages. Order by pages: http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name I have restrict all pages with robots.txt and set Rel=Canonical to base URL. For pagination pages: http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name&p=2 I have restrict all pages with robots.txt and set Rel=Next & Rel=Prev to all paginated pages. I have also set Rel=Canonical to base URL. I have done & apply all SEO suggestions to my website but, Google is crawling and indexing 21K+ pages. My website have only 9K product pages. Google search result: https://www.google.com/search?num=100&hl=en&safe=off&pws=0&gl=US&q=site:www.vistastores.com&biw=1366&bih=520 Since last 7 days, my website have affected with 75% down of impression & CTR. I want to recover it and perform better as previous one. I have explained my question in long manner because, want to recover my traffic as soon as possible.
Intermediate & Advanced SEO | | CommercePundit0 -
Duplicate page Content
There has been over 300 pages on our clients site with duplicate page content. Before we embark on a programming solution to this with canonical tags, our developers are requesting the list of originating sites/links/sources for these odd URLs. How can we find a list of the originating URLs? If you we can provide a list of originating sources, that would be helpful. For example, our the following pages are showing (as a sample) as duplicate content: www.crittenton.com/Video/View.aspx?id=87&VideoID=11 www.crittenton.com/Video/View.aspx?id=87&VideoID=12 www.crittenton.com/Video/View.aspx?id=87&VideoID=15 www.crittenton.com/Video/View.aspx?id=87&VideoID=2 "How did you get all those duplicate urls? I have tried to google the "contact us", "news", "video" pages. I didn't get all those duplicate pages. The page id=87 on the most of the duplicate pages are not supposed to be there. I was wondering how the visitors got to all those duplicate pages. Please advise." Note, the CMS does not create this type of hybrid URLs. We are as curious as you as to where/why/how these are being created. Thanks.
Intermediate & Advanced SEO | | dlemieux0