Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Beta Site Removal best practices
Hi everyone.
Intermediate & Advanced SEO | | bgvsiteadmin
We are doing a CMS migration and site redesign with some structural changes. Our temporarily Beta site (one of the staging environments and the only one that is not behind firewall) started appearing in search. Site got indexed before we added robots.txt due to dev error (at that time all pages were index,follow due to nature of beta site, it is a final stage that mirrors live site) As an remedy, we implemented robots.txt for beta version as : User-Agent: *
Disallow: / Removed beta form search for 90 days. Also, changed all pages to no index/no follow . Those blockers will be changed once code for beta get pushed into production. However, We already have all links redirected (301) from old site to new one. this will go in effect once migration starts (we will go live with completely redesigned site that is now in beta, in few days). After that, beta will be deleted completely and become 404 or 410. So the question is, should we delete beta site and simple make 404/410 without any redirects (site as is existed for only few days ). What is best thing to do, we don't want to hurt our SEO equity. Please let me know if you need more clarification. Thank you!0 -
Is this the correct way of using rel canonical, next and prev for paginated content?
Hello Moz fellows, a while ago (3-4 years ago) we setup our e-commerce website category pages to apply what Google suggested to correctly handle pagination. We added rel "canonicals", rel "next" and "prev" as follows: On page 1: On page 2: On page 3: And so on, until the last page is reached: Do you think everything we have been doing is correct? I have doubts on the way we have handled the canonical tag, so, any help to confirm that is very appreciated! Thank you in advance to everyone.
Intermediate & Advanced SEO | | fablau0 -
"Unnatural links to your site" manual action by Google
Hi, My site has been hit by a "Unnatural links to your site" manual action penalty and I've just received a decline on my 2nd reconsideration request, after disavowing even more links than I did in the first request. I went over all the links in WMT to my site with an SEO specialist and we both thought things have been resolved but apparently they weren't. I'd appreciate any help on this so as to lift the penalty and get my site back to its former rankings, it has ranked well before and the timing couldn't have been worse. Thanks,
Intermediate & Advanced SEO | | ishais
Yael0 -
What is the best way to take advantage of this keyword?
Hi SEO's! I've been checking out webmaster tools (screenshot attached) and noticed that we're getting loads of long tail searches around a search query 'arterial and venous leg ulcers' - on a side note we're a nursing organisation so excuse the content of the search!!! The trouble is that google is indexing a PDF page which we give out as a freebie:
Intermediate & Advanced SEO | | 9868john
http://www.nursesfornurses.com.au/admin/uploads/5DifferencesBetweenVenousAndArterialLegUlcers1.pdf This PDF is a couple of years old and needs updating but its got a few links pointing to it. Ok so down to the nitty gritty, we've just launched a blog:
http://news.nursesfornurses.com.au/Nursing-news/ We have a whole wound care category in which this content belongs, and i'm trying to find the best way to take advantage of the search, so I was thinking: Create an article of about 1000 words Update the PDF and re-upload it to the main domain (not the sub domain news.nursesfornurses.com.au) Attach the PDF to the article on the blog OR would it be better to host this on the blog, and setup a 301 redirect to this page? I just need some advice on how best to take advantage of this opportunity, our blog isn't getting much search traffic at the moment (despite having 300+ articles!!) and i'm looking into how we can change that. I look forward to your response and suggestions. Thanks! qtY64B10 -
Best way to structure urls wordpress and Yoast?
I am using Wordpress and Yoast. I have Parent pages and child pages. Yoast recommends you have the keyword in the url. For the parent page I have the city name in the url. Question is, should the child pages also have the city name in the url or would that be considered keyword stuffing? Here is the current structure. http://forestparkdental.info/st-louis-dental-services/restorative-dentistry/inlays-and-onlays So didn't know if should have the end of that url as /restorative-dentistry-st-louis /inlays-and-onlays-st louis since those are separate pages and Yoast and Moz plugin doesn't give you the Green light in in all areas unless you do it like this? Thanks Scott
Intermediate & Advanced SEO | | scott3150 -
Is a "Critical Acclaim" considered duplicate content on an eCommerce site?
I have noticed a lot of wine sites use "Critical Acclaims" on their product pages. These short descriptions made by industry experts are found on thousands of other sites. One example can be found on a Wine.com product page. Wine.com also provides USG through customer reviews on the page for original content. Are the "Critical Acclaim" descriptions considered duplicate content? Is there a way to use this content and it not be considered duplicate (i.e. link to the source)?
Intermediate & Advanced SEO | | mj7750 -
How are pages ranked when using Google's "site:" operator?
Hi, If you perform a Google search like site:seomoz.org, how are the pages displayed sorted/ranked? Thanks!
Intermediate & Advanced SEO | | anthematic0 -
Best way to geo redirect
Hi I have a couple of ecommerce websites which have both a UK and USA store. At the moment I have both the UK and the USA domains sending me traffic from UK and USA search engines which means that a number of users are clicking a Google page for the store not in their location, ie UK people are clicking on a .com listing and ending up on the USA website. What is the best way to automatically redirect people to the correct store for their region? If I use an IP based auto redirect system would Google see some of the pages are doorway pages? Thanks
Intermediate & Advanced SEO | | Grumpy_Carl0