How do you block development servers with robots.txt?
-
When we create client websites the urls are client.oursite.com. Google is indexing theses sites and attaching to our domain. How can we stop it with robots.txt? I've heard you need to have the robots file on both the main site and the dev sites... A code sample would be groovy. Thanks, TR
-
Added X robots tag into our headers on our development sites.
Just a note - if you use apache and have mod_pagespeed installed , it wall conflict and pagespeed will remove the X robots tag.
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
-
On the root of the development subdomain, use the following robots.txt content to block all robots.
User-agent: *
Disallow: /Next, verify the subdomain in Google Webmaster Tools as its own site, and request that that site be removed from the index.
For added protection:
- Make the robots.txt on the live site read only, so when you copy the dev site over you don't accidentally copy over the robots.txt saying to exclude everything
- Set up a code monitor on the robots.txt for both the dev site and the live site that checks the content of those files and alerts you if there are changes. I use https://polepositionweb.com/roi/codemonitor/index.php.
-
Like Daniel said you can use robots.txt to block spiders, but this won't guarantee exclusion of URLs showing up in search results. You could use x-robots-tag in the server headers. Generate a 403 every time user-agent hits the sub domain.
-
I put a .htaccess style password on the development site. If you make a robots.txt to block the site, make sure you don't accidentally put that on the production site.
-
Unfortunately I don't have that option.
-
Just use a directory instead of a sub-domain and then block that directory... that's the easiest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What Schema would a Web design/development/seo ageny use and what is the schema.org link?
What Schema would a Web design/development/SEO Ageny use, and what is the schema.org link? I cannot for the life of me figure it out. ProfessionalService has been deprecated.
On-Page Optimization | | TiagoPedreira0 -
Are there detrimental effects of having multiple robot tags
Hi All, I came across some pages on our site that have multiple robot tags, but they have the same directives. Two are identical while one is for Google only. I know there aren't any real benefits from having it set up this way, but are there any detrimental effects such as slowing down the bots crawling these pages? name="googlebot" content="index, follow, noodp"/> Thanks!
On-Page Optimization | | STP_SEO0 -
Solve duplicate content issues by using robots.txt
Hi, I have a primary website and beside that I also have some secondary websites with have same contents with primary website. This lead to duplicate content errors. Because of having many URL duplicate contents, so I want to use the robots.txt file to prevent google index the secondary websites to fix the duplicate content issue. Is it ok? Thank for any help!
On-Page Optimization | | JohnHuynh0 -
How do i block an entire category/directory with robots.txt?
Anyone has any idea how to block an entire product category, including all the products in that category using the robots.txt file? I'm using woocommerce in wordpress and i'd like to prevent bots from crawling every single one of products urls for now. The confusing part right now is that i have several different url structures linking to every single one of my products for example www.mystore.com/all-products, www.mystore.com/product-category, etc etc. I'm not really sure how i'd type it into the robots.txt file, or where to place the file. any help would be appreciated thanks
On-Page Optimization | | bricerhodes0 -
Do I need a robots meta tag on the homepage of my site?
Is it recommended to include on the homepage of your site site? I would like Google to index and follow my site. I am using WordPress and noticed my homepage is not including this meta tag, therefore wondering if I should include it?
On-Page Optimization | | asc760 -
Developing my footer area
Hi, My website has no footer area and I would like some advise on creating/optimising it for seo. Would it advise to keyword link to 10 of my key pages & Have my full company address and this will help with localising my website Company name address, Dublin, Ireland
On-Page Optimization | | Socialdude0 -
In my report of my website it was indicated that I had 19 links/locations blocked by meta-robots. What does this mean and how do I fix it. My website is a Wordpress website.
In my report of my website it was indicated that I had 19 links/locations blocked by meta-robots. What does this mean and how do I fix it. My website is a Wordpress website.
On-Page Optimization | | cyaindc0 -
Robots.txt file
Does it serve any purpose if we omit robots.txt file ? I wonder if spider has to read all the pages, why do we insert robots.txt file ?
On-Page Optimization | | seoug_20050