Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to set up URL structure for reviews off of PDP pages.
We are adding existing customer reviews to Product Detail Pages pages. There are about 300 reviews per product so we're going to have to paginate reviews off of the PDP page. I'm wondering what the best url structure for reviews pages is to get the most seo benefit. For example, would it be something like this? site.com/category/product/reviews/page-1 or something that used parameters, such as: site.com/reviews?product=a Also, what is the best way to show that the internal link on the PDP page to "All Reviews" is a higher priority link than the other links on the page?
Intermediate & Advanced SEO | | katseo10 -
Best Practices for Creating Back Links from "Thought Leader" Content
What is the best way to use articles from a "thought leader" to build high-quality links to my website? I have heard that it is possible to pay bloggers to post business articles that link back to a website. That assuming these blogs have domain authority this is a good technique to improve ranking. Is this in fact true, and if so where would I find blogs to post our content. The purpose would be to promote real estate brokerage website. Any suggestions? Is this possible, advisable, best use of quality content? Alternatively, where else can we post engaging content to create links back to our site? Social media? The nature of the content would be such topics as how to find the best value in Manhattan office of loft space rentals, etcera. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Duplicating relevant category content in subcategories. Good or bad for google ranking?
In a travel related page I have city categories with city related information.
Intermediate & Advanced SEO | | lcourse
Would you recommend for or against duplicating some relevant city related in subcategory pages. For visitor it would be useful and google should have more context about the topic of our page.
But my main concern is how this may be perceived by google and especially whether it may make it more likely being penalized for thin content. We already were hit end of june by panda/phantom and we are working on adding also more unique content, but this would be something that we could do additionally and basically instantaneously. Just do not want to make things worse.0 -
With or without the "www." ?
Is there any benefit whatsoever to having the www. in the URL?
Intermediate & Advanced SEO | | JordanBrown0 -
Best way to handle page filters and sorts
Hello Mozzers, I have a question that has to do with the best way to handle filters and sorts with Googlebot. I have a page that returns a list of widgets. I have a "root" page about widgets and then filter and sort functionality that shows basically the same content but adds parameters to the URL. For example, if you filter the page of 10 widgets by color, the page returns 3 red widgets on the top, and 7 non-red widgets on the bottom. If you sort by size, the page shows the same 10 widgets sorted by size. We use traditional php url parameters to pass filters and sorts, so obviously google views this as a separate URL. Right now we really don't do anything special in Google, but I have noticed in the SERPs sometimes if I search for "Widgets" my "Widgets" and "Widgets - Blue" both rank close to each other, which tells me Google basically (rightly) thinks these are all just pages about Widgets. Ideally though I'd just want to rank for my "Widgets" root page. What is the best way to structure this setup for googlebot? I think it's maybe one or many of the following, but I'd love any advice: put rel canonical tag on all of the pages with parameters and point to "root" use the google parameter tool and have it not crawl any urls with my parameters put meta no robots on the parameter pages Thanks!
Intermediate & Advanced SEO | | jcgoodrich0 -
What is the difference between link rel="canonical" and meta name="canonical"?
Hi mozzers, I would like to know What is the difference between link rel="canonical" and meta name="canonical"? and is it dangerous to have both of these elements combined together? One of my client's page has the these two elements and kind of bothers me because I only know link rel="canonical" to be relevant to remove duplicates. Thanks!
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Best Way to Consolidate Domains?
Hello, My company has four websites in the same vertical and we're planning to integrate them all on our main company site. So instead of www.siteone.com, www.sitetwo.com, www.sitethree.com, etc. It would be www.branddomain.com/site-one, www.branddomain.com/site-two, etc. I have a few questions... Should we redirect the old domains to the new directories or leave the old domains and stop updating them with new content... Then have the old content, links, etc. 301 to the same content on the new site? Should we literally move all of the content to the new directories? Any tips are appreciated. It's probably pretty obvious that I don't have a ton of technical skills... my development team will be doing the heavy lifting. I just want to be sure we do this correctly from an SEO perspective! Thanks for the help, please let me know if I can clarify anything. E
Intermediate & Advanced SEO | | essdee0 -
ECommerce products duplicate content issues - is rel="canonical" the answer?
Howdy, I work on a fairly large eCommerce site, shop.confetti.co.uk. Our CMS doesn't allow us to have 1 product with multiple colour and size options so we created individual product pages for each product variation. This of course means that we have duplicate content issues. The layout of the shop works like this; there is a product group page (here is our disposable camera group) and individual product pages are below. We also use a Google shopping feed. I'm sure we're being penalised as so many of the products on our site are duplicated so, my question is this - is rel="canonical" the best way to stop being penalised and how can I implement it? If not, are there any better suggestions? Also, we have targeted some long-tail keywords in some of the product descriptions so will using rel-canonical effect this or the Google shopping feed? I'd love to hear experiences from people who have been through similar things and what the outcome was in terms of ranking/ROI. Thanks in advance.
Intermediate & Advanced SEO | | Confetti_Wedding0