How do you block development servers with robots.txt?
-
When we create client websites the urls are client.oursite.com. Google is indexing theses sites and attaching to our domain. How can we stop it with robots.txt? I've heard you need to have the robots file on both the main site and the dev sites... A code sample would be groovy. Thanks, TR
-
Added X robots tag into our headers on our development sites.
Just a note - if you use apache and have mod_pagespeed installed , it wall conflict and pagespeed will remove the X robots tag.
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
-
On the root of the development subdomain, use the following robots.txt content to block all robots.
User-agent: *
Disallow: /Next, verify the subdomain in Google Webmaster Tools as its own site, and request that that site be removed from the index.
For added protection:
- Make the robots.txt on the live site read only, so when you copy the dev site over you don't accidentally copy over the robots.txt saying to exclude everything
- Set up a code monitor on the robots.txt for both the dev site and the live site that checks the content of those files and alerts you if there are changes. I use https://polepositionweb.com/roi/codemonitor/index.php.
-
Like Daniel said you can use robots.txt to block spiders, but this won't guarantee exclusion of URLs showing up in search results. You could use x-robots-tag in the server headers. Generate a 403 every time user-agent hits the sub domain.
-
I put a .htaccess style password on the development site. If you make a robots.txt to block the site, make sure you don't accidentally put that on the production site.
-
Unfortunately I don't have that option.
-
Just use a directory instead of a sub-domain and then block that directory... that's the easiest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Correct robots.txt for WordPress
Hi. So I recently launched a website on WordPress (1 main page and 5 internal pages). The main page got indexed right off the bat, while other pages seem to be blocked by robots.txt. Would you please look at my robots file and tell me what‘s wrong? I wanted to block the contact page, plugin elements, users’ comments (I got a discussion space on every page of my website) and website search section (to prevent duplicate pages from appearing in google search results). Looks like one of the lines is blocking every page after ”/“ from indexing, even though everything seems right. Thank you so much. FzSQkqB.jpg
On-Page Optimization | | AslanBarselinov1 -
Avoid landing page redirects C (75) SERVER HIGH What's this mean?
HI we have a wordpress website with a chain of redirected urls? one is with www. the other without? How do I found out where this is? or what could be causing it? Any help much appreciated 🙂
On-Page Optimization | | KellyDSD861 -
Meta Robots index & noindex Both Implemented on Website
I don't want few of the pages of website to get indexed by Google, thus I have implemented meta robots noindex code on those specific pages. Due to some complications I am not able to remove meta robots index from header of every page Now, on specific pages I have both codes 'index & noindex' implemented. Question is: Will Google crawl/index pages which have noindex code along with index code? Thanks!
On-Page Optimization | | Exa0 -
Magento Canonical & Default Robots Settings
Hello! I'm working with Magento 1.9 for an eCommerce site with several hundred products. Currently I understand it is best practices to use the Canonical tag, however I also have my default robots set to "Index, Follow". Will this cause an issue having product pages set to index, follow but also having a canonical tag included? What are some best practices regarding Magento default robots & canonical tags? Any help is appreciated.
On-Page Optimization | | BretDarby0 -
How to exclude URL filter searches in robots.txt
When I look through my MOZ reports I can see it's included 'pages' which it shouldn't have included i.e. adding filtering rules such as this one http://www.mydomain.com/brands?color=364&manufacturer=505 How can I exclude all of these filters in the robots.txt? I think it'll be: Disallow: /*?color=$ Is that the correct syntax with the $ sign in it? Thanks!
On-Page Optimization | | neenor0 -
Large block of several thousand words on homepage of Ecommerce site - opinions?
I work for a local SEO company who are continually adding content to the home page of clients websites. While i do agree that it is a good idea to add content to the home page and to link to inner pages, the home page of several clients exceeds 6,000 or more words. Every month, an article based on a brand or category (which already has its own page which is optimised with an article) is added to the home page with links to inner pages. I have stressed that it is not a good idea to have so much content on the home page, and even more so that it shouldn't target all the same keywords. I have pointed out that many of the brand and category pages do not rank well for their keywords, whereas in most cases it is the home page which is ranking instead, which i have suggested this is because the home page is too well optimised for those keywords. What are your opinions on this? Are you for or against continually adding content (which already has its own designated page) to the home page of an Ecommerce website? I should also add that this content is within a drop down div box at the footer on some sites and just above the footer on other sites. The category and brand pages also have drop down divs with an article just below the header.
On-Page Optimization | | Kinsel0 -
What's the best practice for implementing a "content disclaimer" that doesn't block search robots?
Our client needs a content disclaimer on their site. This is a simple "If you agree to these rules then click YES if not click NO" and you're pushed back to the home page. I have this gut feeling that this may cause an upset with the search robots. Any advice? R/ John
On-Page Optimization | | TheNorthernOffice790 -
Photogallery and Robots.txt
Hey everyone SEOMOZ is telling us that there are to many onpage links on the following page: http://www.surfcampinportugal.com/photos-of-the-camp/ Should we stop it from being indexed via Robots.txt? best regards and thanks in advance... Simon
On-Page Optimization | | Rapturecamps0