Easy Question: regarding no index meta tag vs robot.txt
-
This seems like a dumb question, but I'm not sure what the answer is. I have an ecommerce client who has a couple of subdirectories "gallery" and "blog". Neither directory gets a lot of traffic or really turns into much conversions, so I want to remove the pages so they don't drain my page rank from more important pages. Does this sound like a good idea?
I was thinking of either disallowing the folders via robot.txt file or add a "no index" tag or 301redirect or delete them. Can you help me determine which is best.
**DEINDEX: **As I understand it, the no index meta tag is going to allow the robots to still crawl the pages, but they won't be indexed. The supposed good news is that it still allows link juice to be passed through. This seems like a bad thing to me because I don't want to waste my link juice passing to these pages. The idea is to keep my page rank from being dilluted on these pages. Kind of similar question, if page rank is finite, does google still treat these pages as part of the site even if it's not indexing them?
If I do deindex these pages, I think there are quite a few internal links to these pages. Even those these pages are deindexed, they still exist, so it's not as if the site would return a 404 right?
ROBOTS.TXT As I understand it, this will keep the robots from crawling the page, so it won't be indexed and the link juice won't pass. I don't want to waste page rank which links to these pages, so is this a bad option?
**301 redirect: **What if I just 301 redirect all these pages back to the homepage? Is this an easy answer? Part of the problem with this solution is that I'm not sure if it's permanent, but even more importantly is that currently 80% of the site is made up of blog and gallery pages and I think it would be strange to have the vast majority of the site 301 redirecting to the home page. What do you think?
DELETE PAGES: Maybe I could just delete all the pages. This will keep the pages from taking link juice and will deindex, but I think there's quite a few internal links to these pages. How would you find all the internal links that point to these pages. There's hundreds of them.
-
Hello Santaur,
I'm afraid this question isn't as easy as you may have thought at first. It really depends on what is on the pages in those two directories, what they're being used for, who visits them, etc... Certainly removing them altogether wouldn't be as terrible as some people might think IF those pages are of poor quality, have no external links, and very few - if any - visitors. It sounds to me that you might need a "Content Audit" wherein the entire site is crawled, using a tool like Screaming Frog, and then relevant metrics are pulled for those pages (e.g. Google Analytics visits, Moz Page Authority and external links...) so you can look at them and make informed decisions about which pages to improve, remove or leave as-is.
Any page that gets "removed" will leave you with another choice: Allow to 404/410 or 301 redirect. That decision should be easy to make on a page-by-page basis after the content audit because you will be able to see which ones have external links and/or visitors within the time period specified (e.g. 90 days). Pages that you have decided to "Remove" which have no external links and no visits in 90 days can probably just be deleted. The others can be 301 redirected to a more appropriate page, such as the blog home page, top level category page, similar page or - if all else fails - the site home page.
Of course any page that gets removed, whether it redirects or 404s/410s should have all internal links updated as soon as possible. The scan you did with Screaming Frog during the content audit will provide you with all internal links pointing to each URL, which should speed up that process for you considerably.
Good luck!
-
I would certainly think twice about removing those pages as they're in most cases of value for both your SEO as your users. If you would decide to go this way and to have them removed I would redirect all the pages belonging to these subdirectories to another page (let's say the homepage). Although you have a limited amount of traffic there you still want to make sure that the people who land on these pages get redirected to a page that does exist.
-
Are you sure you want to do this? You say 80% of the site consists of gallery and blog pages. You also say there are a lot of internal links to those pages. Are you perhaps under estimating the value of long- tail traffic
To answer your specific question, yes link juice will still pass thru to the pages that are no indexed. They just won't ever show up in search results. Using robots noindex gets you the same result. 301 redirects will pass all your link juice back to the home page, but makes for a lousy user experience. Same for deleting pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
Https indexed...how?
Hello Moz, Since a while i am struggling with a SEO case: At the moment a https version of a homepage of a client of us is indexed in Google. Thats really strange because the url is redirected to an other website url for three weeks now. And we did everything to make clear to google that he has to index the other url.
Technical SEO | | Searchresult
So we have a few homepage urls A https://www.website.nl
B https://www.websites.nl/category
C http://www.websites.nl/category What we did: Redirected A with a 301 to B, a redirect from A or B to C is difficult because of the security issue with the ssl certificate. We put the right canonical url (VERSION C) on every version of the homepage(A,B) We only put the canonical urls in the sitemap.xml, only version C and uploaded it to Google Webmastertools We changed all important internal links to Version C We also get some valuable external backlinks to Version C Is there something i missed or i forget to say to Google hey look you've got the wrong url indexed, you have to index version C? How is it possible Google still prefers Version A after doing al those changes three weeks a go? I'am really looking forward to your answer. Thanks a lot in advanced! Greetz Djacko0 -
A few misc Webmaster tools questions & Robots.txt etc
Hi I have a few general misc questions re Robots.tx & GWT: 1) In the Robots.txt file what do the below lines block, internal search ? Disallow: /?
Technical SEO | | Dan-Lawrence
Disallow: /*? 2) Also the sites feeds are blocked in robots.txt, why would you want to block a sites feeds ? **3) **What's the best way to deal with the below: - old removed page thats returning a 500 response code ? - a soft 404 for an old removed page that has no current replacement old removed pages returning a 404 The old pages didn't have any authority or inbound links hence is it best/ok to simply create a url removal request in GWT ? Cheers Dan0 -
What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT. I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap. Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
Technical SEO | | DotCar0 -
Is there a reason to set a crawl-delay in the robots.txt?
I've recently encountered a site that has set a crawl-delay command set in their robots.txt file. I've never seen a need for this to be set since you can set that in Google Webmaster Tools for Googlebot. They have this command set for all crawlers, which seems odd to me. What are some reasons that someone would want to set it like that? I can't find any good information on it when researching.
Technical SEO | | MichaelWeisbaum0 -
Robots txt
We have a development site that we want google and other bots to stay out of but we want roger to have access. Currently our robots.txt looks like this: User-agent: *
Technical SEO | | LadyApollo
Disallow: /cgi-bin/
Disallow: /development/ What would i need to addd or change to let him through? Thank you.0 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0 -
Optimum title and description meta tag length
Hi all, I have read that a title tag and description tag length of 69 and 156 characters respectively, should be used as this is all that Google will show in the search results, but that search engine robots will read longer titles and descriptions and additional characters will have an effect on ranking algorithms. However, is there any SEO benefit in making title and description tags longer to include more keywords to aid ranking, even though the latter part won't be visible in the results. I have read elsewhere on this forum that there may be concerns with regards to keyword dilution, but what about keyword reinforcement, i.e. by a repetition of the main keyword at the end of the title/description (I mean in a readable manner here, not 'stuffed')? Thanks in advance, Gareth
Technical SEO | | gdavies090319770