Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the best SEO way for a shop
Hi there ! A client want to sell some products on its future website but just a small range (the most part of this website will not be an online shop). The idea is to add a "shop" button in the menu to redirect clients in this shop. I would like your opinion about how should I construct this shop, what do you think is the best for SEO : "www.website.com/shop" or "shop.website.com" thank you in advance for your answers !
Intermediate & Advanced SEO | | EnjinFrance0 -
Duplicate Multi-site Content, Duplicate URLs
We have 2 ecommerce sites that are 95% identical. Both sites carry the same 2000 products, and for the most part, have the identical product descriptions. They both have a lot of branded search, and a considerable amount of domain authority. We are in the process of changing out product descriptions so that they are unique. Certain categories of products rank better on one site than another. When we've deployed unique product descriptions on both sites, we've been able to get some double listings on Page 1 of the SERPs. The categories on the sites have different names, and our URL structure is www.domain.com/category-name/sub-category-name/product-name.cfm. So even though the product names are the same, the URLs are different including the category names. We are in the process of flattening our URL structures, eliminating the category and subcategory names from the product URLs: www.domain.com/product-name.cfm. The upshot is that the product URLs will be the same. Is that going to cause us any ranking issues?
Intermediate & Advanced SEO | | AMHC0 -
HELP! How does one prevent regional pages as being counted as "duplicate content," "duplicate meta descriptions," et cetera...?
The organization I am working with has multiple versions of its website geared towards the different regions. US - http://www.orionhealth.com/ CA - http://www.orionhealth.com/ca/ DE - http://www.orionhealth.com/de/ UK - http://www.orionhealth.com/uk/ AU - http://www.orionhealth.com/au/ NZ - http://www.orionhealth.com/nz/ Some of these sites have very similar pages which are registering as duplicate content, meta descriptions and titles. Two examples are: http://www.orionhealth.com/terms-and-conditions http://www.orionhealth.com/uk/terms-and-conditions Now even though the content is the same, the navigation is different since each region has different product options / services, so a redirect won't work since the navigation on the main US site is different from the navigation for the UK site. A rel=canonical seems like a viable option, but (correct me if I'm wrong) it tells search engines to only index the main page, in this case, it would be the US version, but I still want the UK site to appear to search engines. So what is the proper way of treating similar pages accross different regional directories? Any insight would be GREATLY appreciated! Thank you!
Intermediate & Advanced SEO | | Scratch_MM0 -
Bad site migration - what to do!
Hi Mozzers - I'm just looking at a site which has been damaged by a very poor site migration. Basically, the old URLs were 301'd to a page on the new website (not a 404) telling everyone the page no longer existed. They did not 301 old pages to equivalent new pages. So I just checked Google WMT and saw 1,000 crawl errors - basically the old URLs. This migration was done back in February, since when traffic to the website has never recovered. Should I fix this now? Is it worth implementing the correct 301s now, after such a timelapse?
Intermediate & Advanced SEO | | McTaggart0 -
Duplicate content issue - online retail site.
Hello Mozzers, just looked at a website and just about every product page (there are hundreds - yikes!) is duplicated like this at end of each url (see below). Surely this is a serious case of duplicate content? Any idea why a web developer would do this? Thanks in advance! Luke prod=company-081
Intermediate & Advanced SEO | | McTaggart
prod=company-081&cat=20 -
How to Best Establish Ownership when Content is Duplicated?
A client (Website A) has allowed one of their franchisees to use some of the content from their site on the franchisee site (Website B). This franchisee lifted the content word for word, so - my question is how to best establish that Website A is the original author? Since there is a business relationship between the two sites, I'm thinking of requiring Website B to add a rel=canonical tag to each page using the duplicated content and referencing the original URL on site A. Will that work, or is there a better solution? This content is primarily informational product content (not blog posts or articles), so I'm thinking rel=author may not be appropriate.
Intermediate & Advanced SEO | | Allie_Williams0 -
Amazing decrease of visits in a Good Content Site
Dear Sirs, contributors and aspirants of Seomoz: I have a site called General History (http://general-history.com/) that was created in 2010, and has a current PR of 3, a DA of 23 and a home page authority of 32. It also has 1.690 links, knowing that we have not invested on link building, all the links were built manually via post inserting or viral via social shares. The thing is that in only 5 months, it passed from receiving 14.000 visits/per month to only 1.500. Is that a decrease of 700% in 5 months? I must admint that I earn my life offering SEO to companies, but this is one of my own sites, a site in which my 73 year old father likes to write about General History. I really think, given that he used to be a journalist, that the content not only isn't spam but it is high quality content. As I had Analytics, I started searching for the cause. The first question was... 1.- From what source did I loose the most amount of visitors? Organic, Paid or Social. The answer is organic by far. As I discovered it was an organic loss, I tried to find what content used to have the most visitors. I found 3 posts that brought 80% of the total traffic. How did the people find the content? Well, some of them found the site in the first page of google when searching for "Holocaust facts and figures" for example, but Analytics says that the most people came from image search in Google Images. General history disappeared from the SERPs but progressively, not from one day to another. So then I thought, It can't be a penalization. I contacted google and send them a reconsideration. 5 days later they answered saying that general-history.com is not a spammy site and thus it has not been penalized. For the ones who can read Spanish, here is Google answer: "Estimado webmaster o propietario del sitio http://general-history.com/: Hemos recibido una solicitud del propietario de un sitio para que volvamos a comprobar si http://general-history.com/ cumple las directrices para webmasters de Google. Hemos revisado tu sitio y no hemos detectado acciones manuales del equipo de webspam que puedan perjudicar la clasificación del mismo en Google. No es necesario que presentes una solicitud de reconsideración para el mismo, ya que las incidencias relacionadas con la clasificación que puedan producirse no se derivan de acciones manuales realizadas por el equipo de webspam. Existen otras incidencias relacionadas con tu sitio que pueden perjudicar la clasificación del mismo. Los ordenadores de Google determinan el orden de los resultados de búsqueda a través de una serie de fórmulas denominadas algoritmos. Cada año, se realizan cientos de cambios en los algoritmos de búsqueda, y se utilizan más de 200 señales diferentes para clasificar páginas. A medida que cambian los algoritmos y la Web (incluido tu sitio), se pueden producir fluctuaciones en la clasificación, ya que se actualiza para ofrecer a los usuarios los resultados más relevantes. Si has detectado un cambio en la clasificación y consideras que no se debe simplemente a un cambio de algoritmos, te recomendamos que investigues otras posibles causas, como un cambio importante en el contenido del sitio, en el sistema de gestión de contenido o en la arquitectura del servidor. Por ejemplo, es posible que un sitio no obtenga una buena posición en los resultados de búsqueda si el servidor deja de proporcionar páginas a Googlebot o si el usuario cambia las URL de una gran parte de las páginas del sitio. En este artículo se incluye una lista de otros posibles motivos por los que tu sitio no obtiene una buena clasificación en los resultados de búsqueda. Si sigues sin poder solucionar la incidencia, accede al foro de ayuda para webmasters para obtener asistencia. Atentamente, Equipo de Calidad de búsqueda de Google" They say interesting things like it might be other problems that caused my position decrease like: Site content change, content management, server architecture or change or urls. After receiving this, I thought I should get in the admin panel in wordpress and search for bugs, html or css, php errors and I found that somebody had hijacked my site, entering the wordpress panel and adding a code of into one of my landing pages. That page does not exist anymore. I erased completely. The span code was as follows:
Intermediate & Advanced SEO | | Tintanus
General History | General-History General History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-History I thought that would be the problem ! But it was NOT, because Google did not penalize me as you can see in the letter they sent me. I erased the complete page in which the span appeared, I updated my sitemap, re-check my robots.txt, searched my folders via FTP and mucho more... Conclusion? I have no idea why I General-History has lost 700% of its traffic in 5 months.0 -
URL structure + process for a large travel site
Hello, I am looking at the URL structure for a travel site that will want to optimise lots of locations to a wide variety of terms, so for example hotels in london
Intermediate & Advanced SEO | | onefinestay
hotels in kensington (which is in london)
five star hotels in kensington
etc I am keen to see if my thought process is correct as you see so many different URL techniques out there. Or am i overthinking it too much? Lets assume we make the page /london/ as our homepage. we would then logically link to /london/hotels to optimise specifically for 'london hotels' We then have two options in my mind for optimising for 'kensington hotels': Link to a page that keeps /london/hotels/ in its URL to maintain consistency ie A. /london/hotels/kensington or should we be linking to: B. /london/kensington/hotels/ (as it allows us to maintain a logical geo-landing page hierarchy) I feel A is good as the URL matches the search phrase 'hotels in kensington' matches the order of the search phrase, but it loses value if any links find these pages with 'kensington' in the anchor text, as they would not really strengthen the 'kensington' hub page. /london/kensington Ie: i land on the 'kensington hotels' page and want to see more about kensington, then i could go from /london/kensington/hotels
to
/london/kensington quite easily and logically in the breadcrumb. I feel B. is the best option for now.. Happy to I am only musing as i see some good sites that use option A, which effectively pushes the location (/kensington/ to the end of the URL for each additional niche sub page, ie /london/hotels/five-star-hotels/kensington/) Some of the bigger travel sites dont even use folder, they just go:
example.com/five-star-hotels-in-kensington/ Comments welcome!!! Thanks0