How do you block development servers with robots.txt?
-
When we create client websites the urls are client.oursite.com. Google is indexing theses sites and attaching to our domain. How can we stop it with robots.txt? I've heard you need to have the robots file on both the main site and the dev sites... A code sample would be groovy. Thanks, TR
-
Added X robots tag into our headers on our development sites.
Just a note - if you use apache and have mod_pagespeed installed , it wall conflict and pagespeed will remove the X robots tag.
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
-
On the root of the development subdomain, use the following robots.txt content to block all robots.
User-agent: *
Disallow: /Next, verify the subdomain in Google Webmaster Tools as its own site, and request that that site be removed from the index.
For added protection:
- Make the robots.txt on the live site read only, so when you copy the dev site over you don't accidentally copy over the robots.txt saying to exclude everything
- Set up a code monitor on the robots.txt for both the dev site and the live site that checks the content of those files and alerts you if there are changes. I use https://polepositionweb.com/roi/codemonitor/index.php.
-
Like Daniel said you can use robots.txt to block spiders, but this won't guarantee exclusion of URLs showing up in search results. You could use x-robots-tag in the server headers. Generate a 403 every time user-agent hits the sub domain.
-
I put a .htaccess style password on the development site. If you make a robots.txt to block the site, make sure you don't accidentally put that on the production site.
-
Unfortunately I don't have that option.
-
Just use a directory instead of a sub-domain and then block that directory... that's the easiest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO Content Development - Where do unicorns live?
A great web page (for organic search) needs more than great copy. Tons of articles tell us how important it is construct a web page. Others beat the drum of content, content, content. Who are these unicorns who understand on-page SEO and are great writers to boot? I'm imagining a "content developer", or I might call it a "technical SEO writer. Neither really captures the need. I don't need a copywriter to deliver some text, nor an SEO who can't write their way out of the paper sack. I need an "SEO content specialist" who can craft an on-page experience; someone who thinks about things like SERP features and understands the concepts of semantic content. The problem is that I have no idea how to find this person. "Content Marketing" is the buzz right now, but that's not it either. I'm not talking about a blogs and social media. I'm talking about building great, core web pages. Does anyone else have this challenge? How have you been successful? Where do unicorns live?
On-Page Optimization | | Jason-Rogers4 -
Robots.txt Question for E-Commerce Sites
Hi All, I have a couple of e-commerce clients and have a question about URLs. When you perform a search on website all URLs contain a question mark, for example: /filter.aspx?search=blackout I'm not sure that I want these indexed. Could I be causing any harm/danger if I add this to the robots.txt file? /*? Any suggestions welcome! Gavin
On-Page Optimization | | IcanAgency0 -
Are there detrimental effects of having multiple robot tags
Hi All, I came across some pages on our site that have multiple robot tags, but they have the same directives. Two are identical while one is for Google only. I know there aren't any real benefits from having it set up this way, but are there any detrimental effects such as slowing down the bots crawling these pages? name="googlebot" content="index, follow, noodp"/> Thanks!
On-Page Optimization | | STP_SEO0 -
Blocking Subdomain from Google Crawl and Index
Hey everybody, how is it going? I have a simple question, that i need answered. I have a main domain, lets call it domain.com. Recently our company will launch a series of promotions for which we will use cname subdomains, i.e try.domain.com, or buy.domain.com. They will serve a commercial objective, nothing more. What is the best way to block such domains from being indexed in Google, also from counting as a subdomain from the domain.com. Robots.txt, No-follow, etc? Hope to hear from you, Best Regards,
On-Page Optimization | | JesusD3 -
How to exclude URL filter searches in robots.txt
When I look through my MOZ reports I can see it's included 'pages' which it shouldn't have included i.e. adding filtering rules such as this one http://www.mydomain.com/brands?color=364&manufacturer=505 How can I exclude all of these filters in the robots.txt? I think it'll be: Disallow: /*?color=$ Is that the correct syntax with the $ sign in it? Thanks!
On-Page Optimization | | neenor0 -
Disallow a spammed sub-page from robots.txt
Hi, I have a sub-page on my website with a lot of spam links pointing on it. I was wondering if Google will ignore that spam links on my site if i go and hide this page using the robots.txt Does that will get me out of Google's randar on that page or its useless?
On-Page Optimization | | Lakiscy0 -
Best practice for Meta-Robots tag in categories and author pages?
For some of our site we use Wordpress, which we really like working with. The question I have is for the categories and authors pages (and similiar pages), i.e. the one looking: http://www.domain.com/authors/. Should you or should you not use follow, noindex for meta-robots? We have a lot of categories/tags/authors which generates a lot of pages. I'm a bit worried that google won't like this and leaning towards adding the follow, noindex. But the more I read about it, the more I see people disagree. What does the community of Seomoz think?
On-Page Optimization | | Lobtec0 -
Robots.txt: excluding URL
Hi, spiders crawl some dynamic urls in my website (example: http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/ + http://www.keihome.it/elettrodomestici/cappe/cappa-vision-con-tv-falmec/714/open=true) as different pages, resulting duplicate content of course. What is syntax for disallow these kind of urls in robots.txt? Thanks so much
On-Page Optimization | | anakyn0