Best way to "Prune" bad content from large sites?
-
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages?
Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with.
Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
-
I have a section of my website where I heavily use embedded content. Embeds from Youtube, Slideshare, Twitter, Quora etc. Google thinks they're thin, and they don't show up in my analytics because you can read the content without clicking on the page.
http://getonthemap.us/twitter/blog
But I like them, and I think they're helpful. So I no-indexed all but one of the blog posts in that section. It retains the backlinks to the posts, but cleans me up with Google.
If you're deleting, can't you do that quickly from your console?
-
It's hard to say exactly without seeing your site since there are so many potential variables (e.g. are most of your blog posts low quality or just a minority? etc) that would define the best way to go about it.
What I can say though is that you're on the right track as far as using analytics data to determine which ones are providing value right now. There is a danger in losing some rankings if you go removing a huge volume of these posts. Unless they're utter rubbish posts, they'll likely be providing relevance signals to Google on what your site is about. That said, I do think it's a necessary evil and I'd expect you'll be rewarded for it in the long run provided you start replacing the trash with high quality posts in the future.
As for the benefits, if they really are low quality then user engagement is going to be terrible which is obviously not what you should be aiming for. It's also going to be chewing up your crawl budget for no good reason so the leaner your site is, the better base you have to start rebuilding with quality instead of quantity. For the same reason, I generally suggest removing tags and categories that aren't providing any actual benefit too - in most cases I see they're just there either "for good SEO" or because the site owners things that's how users are browsing their site but in almost all cases, that's not true. As always, check your own data on this to be sure.
As for removing vs noindex, this one is always contentious but I lean toward removing simply because it's going to clean things up for the user too and ultimately they should be your primary focus. Having 40,000+ pages of trash on your website is a fantastic indicator to them that your site may not be somewhere they want to be and noindexing them won't do anything to change the user's experience.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the best way to go about product comparison text?
Our website is in the midst of a massive content enrichment project - we're moving from mostly catalog content to optimized web content. Our catalog and copy teams are hoping to include more product comparisons on the web (e.g. "unlike composite basketballs, rubber one's are more X..."), which can certainly provide useful information to our shoppers! However, from an SEO standpoint, we seem to have confused search engines when doing this in the past (i.e. the example above is currently ranked for a "composite basketball" term, not a rubber one). So... What is the best way to provide useful product comparisons without confusing search engines?
Intermediate & Advanced SEO | | laurenf0 -
Will merging sites create a duplicate content penalty?
I have 2 sites that would be better suited being merged and creating a more authoritative site. Basically I'de like to merge site A in to site B. If I add new pages from site A to Site B and create 301 redirects for those pages on site A to the new pages on Site B is that the best way to go about it? As the pages are already indexed would this create any duplicate content issue or would the redirect solve this?
Intermediate & Advanced SEO | | boballanjones0 -
Where to learn how best to promote content?
So now I created some really good content (with help of Egol and Peter here on moz.com) and now I need to promote it. To get it in front of authoritative sites so they hopefully will write about and link to it. I erroneously figured it would be fairly easy. I contacted two writers of a high level industry blog/magazine that previously had mentioned us in press, sent them an email with an invitation to check it out and please let me know what they thought. NO response. They probably get deluged. Anyway, I can't afford to pay a marketing company to promote it. Where can I learn how to best do this myself? The content isn't going to help anyone if no one sees it.... Thanks for any leads! Ron
Intermediate & Advanced SEO | | yatesandcojewelers1 -
What is the best way to incorporate region-based keywords?
Greetings Mozzers, I am wanting to get the most "bang for my buck" in regards to region based keyword pages. If I am going after the keyword "Plumber" and the region "San Antonio", would it be best to: 1- Create a San Antonio Plumber page where we can target all critical factors for the region based keyword "San Antonio Plumber"
Intermediate & Advanced SEO | | MonsterWeb28
2- Link every instance of the term "San Antonio" and "San Antonio Plumber" throughout the site to the newly created "San Antonio Plumber" page. Thank you for any advice/clarification on this matter.0 -
Could a HTML <select>with large numbers of <option value="<url>">'s affect my organic rankings</option></select>
Hi there, I'm currently redesigning my website, and one particular pages lists hotels in New York. Some functionality I'm thinking of adding in is to let the user find hotels close to specific concert venues in New York. My current thinking is to provide the following select element on the page - selecting any one of the options will automatically redirect to my page for that concert venue. The purpose of this isn't to affect the organic traffic - I'm simply introducing this as a tool to help customers find the right hotel, but I certainly don't want it to have an adverse effect on my organic traffic. I'd love to know your thoughts on this. I must add that in certain cities, such as New York, there could be up to 450 different options in this select element. | <select onchange="location=options[selectedIndex].value;"> <option value="">Show convenient hotels for:</option> <option value="http://url1..">1492 New York</option> <option value="http://url2..">Abrons Arts Center</option> <option value="http://url3..">Ace of Clubs New York</option> <option value="http://url4..">Affairs Afloat</option> <option value="http://url5..">Affirmation Arts New York</option> <option value="http://url6..">Al Hirschfeld Theatre</option> <option value="http://url7..">Alice Tully Hall</option> .. .. ..</select> Many thanks Mike |
Intermediate & Advanced SEO | | mjk260 -
What do I do about sites that copy my content?
I've noticed that there are a number of websites that are copying my content. They are putting the full article on their site, mentioning that it was reposted from my site, but contains no links to me. How should I approach this? What are my rights and should I ask them to remove it or add a link? Will the duplicate content affect me?
Intermediate & Advanced SEO | | JohnPeters0 -
Best strategy for "product blocks" linking to sister site? Penguin Penalty?
Here is the scenario -- we own several different tennis based websites and want to be able to maximize traffic between them. Ideally we would have them ALL in 1 site/domain but 2 of the 3 are a partnership which we own 50% of and why are they are off as a separate domain. Big question is how do we link the "products" from the 2 different websites without looking spammy? Here is the breakdown of sites: Site1: Tennis Retail website --> about 1200 tennis products Site2: Tennis team and league management site --> about 60k unique visitors/month Site3: Tennis coaching tip website --> about 10k unique visitors/month The interesting thing was right after we launched the retail store website (site1), google was cranking up and sending upwards of 25k search impressions/day within the first 45 days. Orders kept trickling in and doing well overall for first launching. Interesting thing was Google "impressions" peaked at about 60 days post launch and then started trickling down farther and farther and now at about 3k-5k impressions/day. Many keywords phrases were originally on page 1 (position 6-10) and now on page 3-8 instead. Next step was to start putting "product links" (3 products per page) on site2 and site3 -- about 10k pages in total with about 6 links per page off to the product page (1 per product and 1 per category). We actually divided up about 100 different products to be displayed so this would mean about 2k links per product depending on the page. FYI, those original 10k pages from site2 and site3 already rank very well in Google and have been indexed for the past 2+ years in there. Most popular word on the sites is Tennis so very related. Our rationale was "all the websites are tennis related" and figured that the links on the latest and greatest products would be good for our audience. Pre-Penguin, we also figured this strategy would also help us rank for these products as well for when users are searching on them. We are thinking through since traffic and gone down and down and down from the peak of 45 days ago, that Penguin doesn't like all these links -- so what to do now? How to fix it and make the Penguin happy? Here are a couple of my thoughts on fixing it: 1. Remove the "category link" in our "product grouping" which would cut down the link by 1/3rd. 2. Place a "nofollow" on all the links for the other "product links". This would allow us to get the "user clicks" from these while the user is on that page. 3. On our homepage (site2 & site3), place 3 core products that change frequently (weekly) and showcase the latest and greatest products/deals. Thought is to NOT use the "nofollow" on these links since it is the homepage and only about 5 links overall. Heck part of me debated on taking our top 1000 pages (from the 10k page) and put the links ONLY on those and distribute about 500 products on them so this would mean only 2 links per product -- it would mean though about 4k links going there. Still thinking #2 above could be better? Any other thoughts would be great! Thanks, Jeremy
Intermediate & Advanced SEO | | jab10000 -
Best Practices for Pagination on E-commerce Site
One of my e-commerce clients has a script enabled on their category pages that allows more products to automatically be displayed as you scroll down. They use this instead of page 1, 2, and a view all. I'm trying to decide if I want to insist that they change back to the traditional method of multiple pages with a view all button, and then implement rel="next", rel="prev", etc. I think the current auto method is disorienting for the user, but I can't figure out if it's the same for the spiders. Does anyone have any experience with this, or thoughts? Thanks!
Intermediate & Advanced SEO | | smallbox0