Best way to "Prune" bad content from large sites?

atomiconline

I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?

Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.

Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?

julie-getonthemap

I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.

http://getonthemap.us/twitter/blog

But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.

If you're deleting, can't you do that quickly from your console?

ChrisAshton

It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.

What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.

As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.

As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.

Hope that helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Best way to "Prune" bad content from large sites?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Does redirecting from a "bad" domain "infect" the new domain?

Taxonomy question - best approach for site structure

What is the best way to find related forums in your industry?

"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console

On 1 of our sites we have our Company name in the H1 on our other site we have the page title in our H1 - does anyone have any advise about the best information to have in the H1, H2 and Page Tile

Duplicate Content www vs. non-www and best practices

Rel="canonical" and rel="alternate" both necessary?

Best way to permanently remove URLs from the Google index?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved