Robots.txt File Redirects to Home Page
-
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering:
Is there a benfit to setup your robots.txt file to do this?
Will this effect how their site will get indexed?
Thanks for your response!
- Kyle
Site URL:
-
Yep, if you add a robots.txt it won't redirect. But I would look to remove the 404 redirect as well. It also looks to me like a meta refresh as well which has potential SEO problems. I would much prefer a 301 if they are really keen to redirect 404s.
The main reason for not redirecting 404s is that it stops you from seeing broken links on your website. Imagine you have a discreet link to a services page that is broken - you wouldn't be able to pick it up with link checkers like Xenu and it could go unnoticed for months if not years. Might be worth suggesting to them that they remove it.
-
This is not a normal behavior, you should respond to robots.txt, put the sitemap link in there or simply :
User-agent: *
Disallow:The actual robots.txt gives :
GET robots.txt 302 Found, which redirects to :
GET 404error.html 200 Ok, which redirect to the home with browser behavior :
<meta http-equiv="refresh" content="0;url=/">
You better change this to a normal response
-
Thanks for the input! I haven't had a chance to view their .htaccess file. I am still in the early stages of reviewing their site. I just wasn't sure if their would be a technical reason for them to do this or if it just happened by accident. It sounds like adding a basic robots.txt file would be the appropriate solution.
-
1. I wouldnt advise redirecting the robots.txt to redirect to home page. It seems that they hve a dynamic 404 redirect system - which when a URL doesnt exist the site redirects it to home. There are god and bad points about this strategy, hoever I would prefer NOT to do it.
2. Re getting site indexed - no it wouldnt hurt them, but would give you much less control over the robots directive, in case you want to add custom instructions. If Google crawlers cant get to it (as in its not user agent cloaked to allow the google bot) you will not be able to do so (eg excluding pages from being indexed via robots wont be ossible).
-
I would be surprised if they purposefully redirected it. Have you been able to take a look at what's in the .htaccess file? If you copy and paste what's in there I might be able to see what's going on with it.
Also, if it is being redirected then it won't get crawled and so it won't have any effect. That could be good or bad depending on what you had written in the .txt file.
EDIT:
Just had a quick look at the site. It seems to 404 straight away and then redirect. Therefore I imagine the robots.txt file doesn't exist and they have it set up to redirect 404ing pages to the homepage. Something that I would advise against (it's useful to know what's 404ing).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
Robots.txt checker
Google seems to have discontinued their robots.txt checker. Is there another tool that I can use to check my text instead? Thanks!
Technical SEO | | theLotter0 -
Timely use of robots.txt and meta noindex
Hi, I have been checking every possible resources for content removal, but I am still unsure on how to remove already indexed contents. When I use robots.txt alone, the urls will remain in the index, however no crawling budget is wasted on them, But still, e.g having 100,000+ completely identical login pages within the omitted results, might not mean anything good. When I use meta noindex alone, I keep my index clean, but also keep Googlebot busy with indexing these no-value pages. When I use robots.txt and meta noindex together for existing content, then I suggest Google, that please ignore my content, but at the same time, I restrict him from crawling the noindex tag. Robots.txt and url removal together still not a good solution, as I have failed to remove directories this way. It seems, that only exact urls could be removed like this. I need a clear solution, which solves both issues (index and crawling). What I try to do now, is the following: I remove these directories (one at a time to test the theory) from the robots.txt file, and at the same time, I add the meta noindex tag to all these pages within the directory. The indexed pages should start decreasing (while useless page crawling increasing), and once the number of these indexed pages are low or none, then I would put the directory back to robots.txt and keep the noindex on all of the pages within this directory. Can this work the way I imagine, or do you have a better way of doing so? Thank you in advance for all your help.
Technical SEO | | Dilbak0 -
Is my home page over optimized for this key word?
I've been working for a couple of months not to try and get my site optimized for the key word "kayak fishing". I haven't done any black hat linking or anything and my site had disappeared passed page 76 on Google United States... Did I over optimize things? I get an A for the onpage reports from SEOMOZ. site: www.yakangler.com Keyword: Kayak Fishing
Technical SEO | | mr_w0 -
301 Redirect Domain or 301 Redirect Domain + Interior Pages
Hello - My company acquired another company in our industry and our IT team immediately set up the acquired companies domain name as a an alias to our site. This created a duplicate version of our website under another domain name and Google started ranking interior pages from the aliased acquired site for several top keywords that were previously held by our real site. Should we 301 redirect just the top level domain name of the acquired site to the real site or 301 redirect the top level domain name and the interior pages on the acquired site to help ensure that our real domain will take back the rankings it once had? Thanks!
Technical SEO | | Room2140 -
301 redirect dropped page rank
Hi, We have a www domain that I have changed to a non www domain. The www domain had been in place for some time and had a good page rank, PR4. After this change the page rank dropped significantly (PR0, and now recently back to PR2) despite it being a 301 redirect which I thought "should" carry over the page rank. Yes, I am aware I should have just left it be. Hind sight 20/20 .. ya ya ya 🙂 My questions Is the 301 the correct method for this? Why did the page rank drop despite the 301? Should we go back to the www domain at this point? Thanks Kris
Technical SEO | | adriot0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050 -
Could Having Blog Posts as Home Page Cause Keyword Dilution?
Something I've never been a fan of is having a blog as the home page of a site. I've always thought that it's a bit like walking into someone's house through the kitchen out back.
Technical SEO | | WilliamBay
If it's a vistors first time, it can be a little disconcerting or ackward even if they are not familiar with the writers style. But something just dawned on me, and I'd love a second opinion on this. For websites that focus on multiple keywords (in my most of my client's case it's usually a mix of Wedding Photography, Engagement Photography, Portrait Photography, Family Photography, etc). A lot of these clients will include the photos in a blog post along with a snippet of text that may talk about the people they're photographing and maybe a bit about where they photographed. But they're usually optimizing for the overarching keyword (Wedding... Portrait..., etc as per above). Now I'm wondering if having three or 5 posts on the home page, where most of them are focusing on a specific keyword like New York Wedding Photographer, is actually diluting the keyword they are trying to rank for. My theory is that if I have them move their blog to a domain.com/blog, and solely focus on the desired keyword on the home page, that they would do substantially better in the SERPs. Can anyone subtantiate this? Thanks!0