Blocking robots.txt
-
Do you know any methods to block robots.txt file from external users?
-
You can block on server side all IP except Google bot for any file, but it may lead to ban, because of cloaking.
-
There is no setting or feature to do this, but you can do it with code
You could detect the useragnet and create a robot.txt dynamically depending on the useragent.
But i cant see why you would want to do this. -
Hi Zsolt
I don't believe there is a way to, not in a Windows environment anyway.
Robots.txt has to be hosted at website root level, as in example.com/robots.txt which is where search engine bots look for it. If a bot can find it (which they usually need to if one exists) then a person can too.
If there are pages or folders that you don't want anybody to know about, exclude them from robots.txt and edit the Meta Robots tag on each page concerned to e.g. NoIndex, Follow or whatever is appropriate.
Regards
Simon
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt & meta noindex--site still shows up on Google Search
I have set up my robots.txt like this: User-agent: *
Technical SEO | | RoxBrock
Disallow: / and I have this meta tag in my on a Wordpress site, set up with SEO Yoast name="robots" content="noindex,follow"/> I did "Fetch as Google" on my Google Search Console My website is still showing up in the search results and it says this: "A description for this result is not available because of this site's robots.txt" This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.1 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Blocked URL's by robots.txt
In Google Webmaster Tools shows me 10,936 Blocked URL's by robots.txt and it is very strange when you go to the "Index Status" section where shows that since April 2012 robots.txt blocked many URL's. You can see more precise on the image attached (chart WMT) I can not explain why I have blocked URL's ? because I have nothing in robots.txt.
Technical SEO | | meralucian37
My robots.txt is like this: User-agent: * I thought I was penalized by Penguin in April 2012 because constantly i'am losing visitors now reaching over 40%. It may be a different penalty? Any help is welcome because i'm already so saturated. Mera robotstxt.jpg0 -
Timely use of robots.txt and meta noindex
Hi, I have been checking every possible resources for content removal, but I am still unsure on how to remove already indexed contents. When I use robots.txt alone, the urls will remain in the index, however no crawling budget is wasted on them, But still, e.g having 100,000+ completely identical login pages within the omitted results, might not mean anything good. When I use meta noindex alone, I keep my index clean, but also keep Googlebot busy with indexing these no-value pages. When I use robots.txt and meta noindex together for existing content, then I suggest Google, that please ignore my content, but at the same time, I restrict him from crawling the noindex tag. Robots.txt and url removal together still not a good solution, as I have failed to remove directories this way. It seems, that only exact urls could be removed like this. I need a clear solution, which solves both issues (index and crawling). What I try to do now, is the following: I remove these directories (one at a time to test the theory) from the robots.txt file, and at the same time, I add the meta noindex tag to all these pages within the directory. The indexed pages should start decreasing (while useless page crawling increasing), and once the number of these indexed pages are low or none, then I would put the directory back to robots.txt and keep the noindex on all of the pages within this directory. Can this work the way I imagine, or do you have a better way of doing so? Thank you in advance for all your help.
Technical SEO | | Dilbak0 -
RegEx help needed for robots.txt potential conflict
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both: Allow: /*?p= and Disallow: /?p=& I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number? I've looked at several resources and there is practically no reference to what "&" does... Can anyone shed any light on this, to ensure I am allowing suitable access to a shop? Thanks in advance for any assistance
Technical SEO | | MSTJames0 -
Blocking https from being crawled
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue www.example.com will be my domain In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff? The redirect part is what I am questioning.
Technical SEO | | Sean_Dawes0 -
Robots.txt usage
Hey Guys, I am about make an important improvement to our site's robots.txt we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view. We donot want gallery pages to get indexed and want to save our crawl budget for more important pages. this is one example of our site: http://www.holiday-rentals.co.uk/France/r31.htm When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m". http://www.holiday-rentals.co.uk/France/r31.htm?view=l http://www.holiday-rentals.co.uk/France/r31.htm?view=g http://www.holiday-rentals.co.uk/France/r31.htm?view=m Now my question is: I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too? Will be very thankful if yo look into this for us. Many thanks Hassan I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
Technical SEO | | holidayseo0 -
Block a sub-domain from being indexed
This is a pretty quick and simple (i'm hoping) question. What is the best way to completely block a sub domain from getting indexed from all search engines? One item i cannot use is the meta "no follow" tag. Thanks! - Kyle
Technical SEO | | kchandler0