Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt File Redirects to Home Page
-
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering:
Is there a benfit to setup your robots.txt file to do this?
Will this effect how their site will get indexed?
Thanks for your response!
- Kyle
Site URL:
-
Yep, if you add a robots.txt it won't redirect. But I would look to remove the 404 redirect as well. It also looks to me like a meta refresh as well which has potential SEO problems. I would much prefer a 301 if they are really keen to redirect 404s.
The main reason for not redirecting 404s is that it stops you from seeing broken links on your website. Imagine you have a discreet link to a services page that is broken - you wouldn't be able to pick it up with link checkers like Xenu and it could go unnoticed for months if not years. Might be worth suggesting to them that they remove it.
-
This is not a normal behavior, you should respond to robots.txt, put the sitemap link in there or simply :
User-agent: *
Disallow:The actual robots.txt gives :
GET robots.txt 302 Found, which redirects to :
GET 404error.html 200 Ok, which redirect to the home with browser behavior :
<meta http-equiv="refresh" content="0;url=/">
You better change this to a normal response
-
Thanks for the input! I haven't had a chance to view their .htaccess file. I am still in the early stages of reviewing their site. I just wasn't sure if their would be a technical reason for them to do this or if it just happened by accident. It sounds like adding a basic robots.txt file would be the appropriate solution.
-
1. I wouldnt advise redirecting the robots.txt to redirect to home page. It seems that they hve a dynamic 404 redirect system - which when a URL doesnt exist the site redirects it to home. There are god and bad points about this strategy, hoever I would prefer NOT to do it.
2. Re getting site indexed - no it wouldnt hurt them, but would give you much less control over the robots directive, in case you want to add custom instructions. If Google crawlers cant get to it (as in its not user agent cloaked to allow the google bot) you will not be able to do so (eg excluding pages from being indexed via robots wont be ossible).
-
I would be surprised if they purposefully redirected it. Have you been able to take a look at what's in the .htaccess file? If you copy and paste what's in there I might be able to see what's going on with it.
Also, if it is being redirected then it won't get crawled and so it won't have any effect. That could be good or bad depending on what you had written in the .txt file.
EDIT:
Just had a quick look at the site. It seems to 404 straight away and then redirect. Therefore I imagine the robots.txt file doesn't exist and they have it set up to redirect 404ing pages to the homepage. Something that I would advise against (it's useful to know what's 404ing).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting rid of pagination - redirect all paginated pages or leave them to 404?
Hi all, We're currently in the process of updating our website and we've agreed that one of the things we want to do is get rid of all our pagination (currently used on the blog and product review areas) and instead implement load more on scroll. The question I have is... should we redirect all of the paginated pages and if so, where to? (My initial thoughts were either to the blog homepage or to the archive page) OR do we leave them to just 404? Bear in mind we have thousands of paginated pages 😕 Here's our blog area btw - https://www.ihasco.co.uk/blog Any help would be appreciated, thanks!
Technical SEO | | iHasco0 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Is it good to redirect million of pages on a single page?
My site has 10 lakh approx. genuine urls. But due to some unidentified bugs site has created irrelevant urls 10 million approx. Since we don’t know the origin of these non-relevant links, we want to redirect or remove all these urls. Please suggest is it good to redirect such a high number urls to home page or to throw 404 for these pages. Or any other suggestions to solve this issue.
Technical SEO | | vivekrathore0 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
What should be use 301 or 302 redirection for 404 pages
Please suggest which redirection we should use for 404 pages- 301 or 302. If you can elaborate it with reason then it will be highly appreciated.
Technical SEO | | koamit0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0