Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Call for Help. Hit Badly with "Medic" and another 30% Loss with Sept 28th Update
Hi Everyone, I am not sure how this is all happening. We have been online for about 15 years, and now we are at our lowest amount of traffic in about 10 years. Our sites are www.bestpricenutrition.com and www.mysupplementstore.com. We sell commodity items, but I have focused on unique product descriptions, tons of UGC, blog posts and guides for awhile now and it has always done us well. Until as of late. This is what I feel led up to this, but I am hoping there is something I missed. May 1st, 2018: Migrated www.bestpricenutrition.com and www.mysupplementstore.com from Shopify. Similar sites, but almost all unique content. We purchased www.mysupplementstore.com about 8 years ago. A ton of traffic and sales, which is why we didn't just redirect it. Around May 25th: www.mysupplementstore.com took a big hit and lost almost 40% of its traffic. Nothing happened to www.bestpricenutrition.com, we actually increased traffic. Aug 1st Update: www.mysupplementstore.com lost another 25% of its traffic. www.bestpricenutrition.com lost about 40% of it's traffic. Sept 28th: Nothing happened to www.mysupplementstore.com, but www.bestpricenutrition.com lost another 30% of it's traffic. So I have been trying to figure out if there is anything technically wrong, but doesn't seem so. These are issues we discovered in August. During the migration, the reviews from each site were syndicated to both websites. There were 1000's. This was resolved in mid August. During the migration, the company doing the migration pushed our blog posts to both websites. 100's of blog posts duplicated to each website. This was resolved mid August. We found that a disgruntled employee instead writing unique content for our product pages, she was copying them one from another. This was about 100 product pages, which we have since resolved. What's Left I noticed on www.bestpricenutrition.com that we have 100's of blog posts that are getting hardly any traffic. I had trimmed www.mysupplementstore.com of this low traffic content. I am working on www.bestpricenutrition.com still. I have been in this industry since 2003, survived 2012, but have exhausted everything I know to figure this out. It's another sob story I know, but trying to keep everyone's job alive here, but it doesn't look like it's going to happen. Any help would be greatly appreciated.
Intermediate & Advanced SEO | | vetofunk0 -
Best way to do site seals for clients to have on their sites
I am about to help release a product which also gives people a site seal for them to place on their website. Just like the geotrust, comodo, symantec, rapidssl and other web security providers do.
Intermediate & Advanced SEO | | ssltrustpaul
I have notices all these siteseals by these companies never have nofollow on their seals that link back to their websites. So i am wondering what is the best way to do this. Should i have a nofollow on the site seal that links back to domain or is it safe to not have the nofollow.
It wont be doing any keyword stuffing or anything, it will probly just have our domain in the link and that is all. The problem is too, we wont have any control of where customers place these site seals. From experience i would say they will mostly likely always be placed in the footer on every page of the clients website. I would like to hear any and all thoughts on this. As i can't get a proper answer anywhere i have asked.0 -
Semi-duplicate content yet authoritative site
So I have 5 real estate sites. One of those sites is of course the original, and it has more/better content on most of the pages than the other sites. I used to be top ranked for all of the subdivsion names in my town. Then when I did the next 2-4 sites, I had some sites doing better than others for certain keywords, and then I have 3 of those sites that are basically the same URL structures (besides the actual domain) and they aren't getting fed very many visits. I have a couple of agents that work with me that I loaned my sites to to see if that would help since it would be a different name. My same youtube video is on each of the respective subdivision pages of my site and theirs. Also, their content is just rewritten content from mine about the same length of content. I have looked over and seen a few of my competitors who only have one site and their URL structures arent good at all, and their content isn't good at all and a good bit of their pages rank higher than my main site which is very frustrating to say the least since they are actually copy cats to my site. I sort of started the precedent of content, mapping the neighborhood, how far that subdivision is from certain landmarks, and then shot a video of each. They have pretty much done the same thing and are now ahead of me. What sort of advice could you give me? Right now, I have two sites that are almost duplicate in terms of a template and same subdivsions although I did change the content the best I could, and that site is still getting pretty good visits. I originally did it to try and dominate the first page of the SERPS and then Penguin and Panda came out and seemed to figure that game out. So now, I would still like to keep all the sites, but I'm assuming that would entail making them all unique, which seems to be tough seeing as though my town has the same subdivisions. Curious as to what the suggestions would be, as I have put a lot of time into these sites. If I post my site will it show up in the SERPS? Thanks in advance
Intermediate & Advanced SEO | | Veebs0 -
Large sites linking to us in their menu
Hello, I am digging in to our in-links in WMT and notice that we have a number of sites that link to us in their menu or every page on their site, making for hundreds of thousands of links to our site. Here is an example:
Intermediate & Advanced SEO | | evansluke
http://www.askthebookie.com/ (in the topmost right menu there is a link to our forums) http://www.covers.com/postingforum/ There are about a dozen or so sites that link to our homepage tens-of-thousands of times. Should I disavow them, or is this be viewed as a legitimate link? Thanks in advance for any help!0 -
Best Approach to Get Backlinks for this site
Hello, What would be a good approach to gain backlinks for this site: www.nlpca.com The owners don't have much time to write content. I as the consultant have time but do not have the expertise the owners do. The people that run the site are authorities in the field. Thanks!
Intermediate & Advanced SEO | | BobGW0 -
Question about best approach to site structure
I am curious if anyone can share some advice. I am working on planning architecture for a tour company. The key piece of the content strategy will be providing details on each of the tour destinations, with associated profiles for each city within those destinations. Lots of content, which should be great for the SEO strategy. With regards to the architecture, I have a ‘destinations’ section on the Website where users can access each of the key destinations served by the tour company. My question is – from a planning perspective I can organize my folder structure in a few different ways. http://www.companyurl.com/destinations/touring-regions/cities/ or http://www.companyurl.com/destinations/ http://www.companyurl.com/touring-regionA/ http://www.companyurl.com/touring-regionB/cities-profile/ I am curious if anyone has an opinion on what might perform best in terms of the site structure from an SEO perspective. My fear is taking all of this rich content and placing it so many tiers down in the architecture of the site. Any advice that could be offered would be appreciated. Thanks.
Intermediate & Advanced SEO | | VERBInteractive0 -
Pagination: rel="next" rel="prev" in ?
With Google releasing that instructional on proper pagination I finally hunkered down and put in a site change request. I wanted the rel="next" and rel="prev" implemented… and it took two weeks for the guy to get it done. Brutal and painful. When I looked at the source it turned out he put it in the body above the pagination links… which is not what I wanted. I wanted them in the . Before I respond to get it properly implemented I want a few opinions - is it okay to have the rel="next" in the body? Or is it pretty much mandatory to put it in the head? (Normally, if I had full control over this site, I would just do it myself in 2 minutes… unfortunately I don't have that luxury with this site)
Intermediate & Advanced SEO | | BeTheBoss1 -
Does a mobile site count as duplicate content?
Are there any specific guidelines that should be followed for setting up a mobile site to ensure it isn't counted as duplicate content?
Intermediate & Advanced SEO | | nicole.healthline0