Easy Question: regarding no index meta tag vs robot.txt
-
This seems like a dumb question, but I'm not sure what the answer is. I have an ecommerce client who has a couple of subdirectories "gallery" and "blog". Neither directory gets a lot of traffic or really turns into much conversions, so I want to remove the pages so they don't drain my page rank from more important pages. Does this sound like a good idea?
I was thinking of either disallowing the folders via robot.txt file or add a "no index" tag or 301redirect or delete them. Can you help me determine which is best.
**DEINDEX: **As I understand it, the no index meta tag is going to allow the robots to still crawl the pages, but they won't be indexed. The supposed good news is that it still allows link juice to be passed through. This seems like a bad thing to me because I don't want to waste my link juice passing to these pages. The idea is to keep my page rank from being dilluted on these pages. Kind of similar question, if page rank is finite, does google still treat these pages as part of the site even if it's not indexing them?
If I do deindex these pages, I think there are quite a few internal links to these pages. Even those these pages are deindexed, they still exist, so it's not as if the site would return a 404 right?
ROBOTS.TXT As I understand it, this will keep the robots from crawling the page, so it won't be indexed and the link juice won't pass. I don't want to waste page rank which links to these pages, so is this a bad option?
**301 redirect: **What if I just 301 redirect all these pages back to the homepage? Is this an easy answer? Part of the problem with this solution is that I'm not sure if it's permanent, but even more importantly is that currently 80% of the site is made up of blog and gallery pages and I think it would be strange to have the vast majority of the site 301 redirecting to the home page. What do you think?
DELETE PAGES: Maybe I could just delete all the pages. This will keep the pages from taking link juice and will deindex, but I think there's quite a few internal links to these pages. How would you find all the internal links that point to these pages. There's hundreds of them.
-
Hello Santaur,
I'm afraid this question isn't as easy as you may have thought at first. It really depends on what is on the pages in those two directories, what they're being used for, who visits them, etc... Certainly removing them altogether wouldn't be as terrible as some people might think IF those pages are of poor quality, have no external links, and very few - if any - visitors. It sounds to me that you might need a "Content Audit" wherein the entire site is crawled, using a tool like Screaming Frog, and then relevant metrics are pulled for those pages (e.g. Google Analytics visits, Moz Page Authority and external links...) so you can look at them and make informed decisions about which pages to improve, remove or leave as-is.
Any page that gets "removed" will leave you with another choice: Allow to 404/410 or 301 redirect. That decision should be easy to make on a page-by-page basis after the content audit because you will be able to see which ones have external links and/or visitors within the time period specified (e.g. 90 days). Pages that you have decided to "Remove" which have no external links and no visits in 90 days can probably just be deleted. The others can be 301 redirected to a more appropriate page, such as the blog home page, top level category page, similar page or - if all else fails - the site home page.
Of course any page that gets removed, whether it redirects or 404s/410s should have all internal links updated as soon as possible. The scan you did with Screaming Frog during the content audit will provide you with all internal links pointing to each URL, which should speed up that process for you considerably.
Good luck!
-
I would certainly think twice about removing those pages as they're in most cases of value for both your SEO as your users. If you would decide to go this way and to have them removed I would redirect all the pages belonging to these subdirectories to another page (let's say the homepage). Although you have a limited amount of traffic there you still want to make sure that the people who land on these pages get redirected to a page that does exist.
-
Are you sure you want to do this? You say 80% of the site consists of gallery and blog pages. You also say there are a lot of internal links to those pages. Are you perhaps under estimating the value of long- tail traffic
To answer your specific question, yes link juice will still pass thru to the pages that are no indexed. They just won't ever show up in search results. Using robots noindex gets you the same result. 301 redirects will pass all your link juice back to the home page, but makes for a lousy user experience. Same for deleting pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using one robots.txt for two websites
I have two websites that are hosted in the same CMS. Rather than having two separate robots.txt files (one for each domain), my web agency has created one which lists the sitemaps for both websites, like this: User-agent: * Disallow: Sitemap: https://www.siteA.org/sitemap Sitemap: https://www.siteB.com/sitemap Is this ok? I thought you needed one robots.txt per website which provides the URL for the sitemap. Will having both sitemap URLs listed in one robots.txt confuse the search engines?
Technical SEO | | ciehmoz0 -
Indexed pages
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers... Google Search Console: 237 indexed pages Google search using site command: 468 results MOZ site crawl: 1013 unique URLs Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500) Can anyone shed any light on why they differ so much? And where lies the truth?
Technical SEO | | muzzmoz1 -
Redirect Question
We have a client that just did a redesign and development and the new design didn't really match their current structure. They said they didn't want to worry about matching site structure and never put any effort into SEO. Here is the situation: They had a blog located on a subdomain such as blog.domain.com - now there blog is located like domain.com/blog They want to create redirects for all the old the blog urls that used to be on the subdomain and not point to the domain.com/blog/post-name What is the best way of doing that - Through .htaccess?
Technical SEO | | Beardo0 -
Do H2 tags carry more weight than h4 tags?
Of course H tags are key signals for relevance in search. Does an h2 tag send a significantly "louder" signal than an h4 tag?
Technical SEO | | aj6130 -
Google is indexing blocked content in robots.txt
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
Technical SEO | | elisainteractive0 -
Restricted by robots.txt and soft bounce issues (related).
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this. Any help? Thanks, Libby
Technical SEO | | GristMarketing0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0