Content Aggregation Site: How much content per aggregated piece is too much?
-
Let's say I set up a section of my website that aggregated content from major news outlets and bloggers around a certain topic. For each piece of aggregated content, is there a bad, fair, and good range of word count that should be stipulated?
I'm asking this because I've been mulling it over—both SEO (duplicate content) issues and copyright issues—to determine what is considered best practice. Any ideas about what is considered best practice in this situation? Also, are there any other issues to consider that I didn't mention?
-
Ryan's resources look good regarding copyright & fair usage. If you're citing a paragraph or two and linking to the source, you're generally in the clear.
In terms of SEO, you'll want to be adding as much unique content to the page as you're citing if you plan on indexing the content. Here are examples of sites that do this curation approach well:
If you're just going to pull the content in directly from an RSS feed, or if you're just adding a sentence plus the quoted text and a link, then you're probably not adding enough value for the content to be worth indexing. I'd set the meta robots tag to "noindex, follow" in this case.
-
There are several examples of sites that do this which you can reference: Verge, Metafilter, HuffPo, BoingBoing, etc. but as far as allowable text goes, you'll want to look up fair use rules for wherever you're located, but here's the US and Wikipedia discussion of them.
http://www.copyright.gov/fls/fl102.html
http://en.wikipedia.org/wiki/Fair_useTypically, you'll want to write some surrounding editorial content (unique) and of course link back to your source. If what you're doing is purely link based, popurls.com or alltop could be good examples. Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My site is not ranking at all.
Can anybody check it what is the main culprit behind my website's growth?
Intermediate & Advanced SEO | | anshu14320 -
Best way to "Prune" bad content from large sites?
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages? Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with. Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
Intermediate & Advanced SEO | | atomiconline0 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
How should I manage duplicate content caused by a guided navigation for my e-commerce site?
I am working with a company which uses Endeca to power the guided navigation for our e-commerce site. I am concerned that the duplicate content generated by having the same products served under numerous refinement levels is damaging the sites ability to rank well, and was hoping the Moz community could help me understand how much of an impact this type of duplicate content could be having. I also would love to know if there are any best practices for how to manage this type of navigation. Should I nofollow all of the URLs which have more than 1 refinement used on a category, or should I allow the search engines to go deeper than that to preserve the long tail? Any help would be appreciated. Thank you.
Intermediate & Advanced SEO | | FireMountainGems0 -
Backlinking 3 sites from same domain and backlinking main site too
Hello, we have 4 sites, in which 1 is a main site and rest 3 are niche sites All these 3 sites have dofollow links to main site from home page We got a high quality backlink - through which all 3 niche sites have got it from that domain Is it worth to add backlink from that domain to main site too, despite the fact the 3 sites already have recvd it and they all link to main site many thanks
Intermediate & Advanced SEO | | Modi0 -
Development site crawled
We just found out our password protected development site has been crawled. We are worried about duplicate content - what are the best steps to take to correct this beyond adding to robots.txt?
Intermediate & Advanced SEO | | EileenCleary0 -
Amazing decrease of visits in a Good Content Site
Dear Sirs, contributors and aspirants of Seomoz: I have a site called General History (http://general-history.com/) that was created in 2010, and has a current PR of 3, a DA of 23 and a home page authority of 32. It also has 1.690 links, knowing that we have not invested on link building, all the links were built manually via post inserting or viral via social shares. The thing is that in only 5 months, it passed from receiving 14.000 visits/per month to only 1.500. Is that a decrease of 700% in 5 months? I must admint that I earn my life offering SEO to companies, but this is one of my own sites, a site in which my 73 year old father likes to write about General History. I really think, given that he used to be a journalist, that the content not only isn't spam but it is high quality content. As I had Analytics, I started searching for the cause. The first question was... 1.- From what source did I loose the most amount of visitors? Organic, Paid or Social. The answer is organic by far. As I discovered it was an organic loss, I tried to find what content used to have the most visitors. I found 3 posts that brought 80% of the total traffic. How did the people find the content? Well, some of them found the site in the first page of google when searching for "Holocaust facts and figures" for example, but Analytics says that the most people came from image search in Google Images. General history disappeared from the SERPs but progressively, not from one day to another. So then I thought, It can't be a penalization. I contacted google and send them a reconsideration. 5 days later they answered saying that general-history.com is not a spammy site and thus it has not been penalized. For the ones who can read Spanish, here is Google answer: "Estimado webmaster o propietario del sitio http://general-history.com/: Hemos recibido una solicitud del propietario de un sitio para que volvamos a comprobar si http://general-history.com/ cumple las directrices para webmasters de Google. Hemos revisado tu sitio y no hemos detectado acciones manuales del equipo de webspam que puedan perjudicar la clasificación del mismo en Google. No es necesario que presentes una solicitud de reconsideración para el mismo, ya que las incidencias relacionadas con la clasificación que puedan producirse no se derivan de acciones manuales realizadas por el equipo de webspam. Existen otras incidencias relacionadas con tu sitio que pueden perjudicar la clasificación del mismo. Los ordenadores de Google determinan el orden de los resultados de búsqueda a través de una serie de fórmulas denominadas algoritmos. Cada año, se realizan cientos de cambios en los algoritmos de búsqueda, y se utilizan más de 200 señales diferentes para clasificar páginas. A medida que cambian los algoritmos y la Web (incluido tu sitio), se pueden producir fluctuaciones en la clasificación, ya que se actualiza para ofrecer a los usuarios los resultados más relevantes. Si has detectado un cambio en la clasificación y consideras que no se debe simplemente a un cambio de algoritmos, te recomendamos que investigues otras posibles causas, como un cambio importante en el contenido del sitio, en el sistema de gestión de contenido o en la arquitectura del servidor. Por ejemplo, es posible que un sitio no obtenga una buena posición en los resultados de búsqueda si el servidor deja de proporcionar páginas a Googlebot o si el usuario cambia las URL de una gran parte de las páginas del sitio. En este artículo se incluye una lista de otros posibles motivos por los que tu sitio no obtiene una buena clasificación en los resultados de búsqueda. Si sigues sin poder solucionar la incidencia, accede al foro de ayuda para webmasters para obtener asistencia. Atentamente, Equipo de Calidad de búsqueda de Google" They say interesting things like it might be other problems that caused my position decrease like: Site content change, content management, server architecture or change or urls. After receiving this, I thought I should get in the admin panel in wordpress and search for bugs, html or css, php errors and I found that somebody had hijacked my site, entering the wordpress panel and adding a code of into one of my landing pages. That page does not exist anymore. I erased completely. The span code was as follows:
Intermediate & Advanced SEO | | Tintanus
General History | General-History General History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-History I thought that would be the problem ! But it was NOT, because Google did not penalize me as you can see in the letter they sent me. I erased the complete page in which the span appeared, I updated my sitemap, re-check my robots.txt, searched my folders via FTP and mucho more... Conclusion? I have no idea why I General-History has lost 700% of its traffic in 5 months.0 -
Duplicate content for swatches
My site is showing a lot of duplicate content on SEOmoz. I have discovered it is because the site has a lot of swatches (colors for laminate) within iframes. Those iframes have all the same content except for the actual swatch image and the title of the swatch. For example, these are two of the links that are showing up with duplicate content: http://www.formica.com/en/home/dna.aspx?color=3691&std=1&prl=PRL_LAMINATE&mc=0&sp=0&ots=&fns=&grs= http://www.formica.com/en/home/dna.aspx?color=204&std=1&prl=PRL_LAMINATE&mc=0&sp=0&ots=&fns=&grs= I do want each individual swatch to show up in search results and they currently are if you search for the exact swatch name. Is the fact that they all have duplicate content affecting my individual rankings and my domain authority? What can I do about it? I can't really afford to put unique content on each swatch page so is there another way to get around it? Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0