How do you block development servers with robots.txt?
-
When we create client websites the urls are client.oursite.com. Google is indexing theses sites and attaching to our domain. How can we stop it with robots.txt? I've heard you need to have the robots file on both the main site and the dev sites... A code sample would be groovy. Thanks, TR
-
Added X robots tag into our headers on our development sites.
Just a note - if you use apache and have mod_pagespeed installed , it wall conflict and pagespeed will remove the X robots tag.
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
Begin Bad Bot Blocking
BrowserMatchNoCase Googlebot bad_bot
BrowserMatchNoCase bingbot bad_bot
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase Moreoverbot/x.00 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase MaxPointCrawler/Nutch-1.1 bad_bot
BrowserMatchNoCase willow bad_bot
Order Deny,Allow
Deny from env=bad_botEnd Bad Bot Blocking
Header set X-Robots-Tag "noindex, nofollow"
-
On the root of the development subdomain, use the following robots.txt content to block all robots.
User-agent: *
Disallow: /Next, verify the subdomain in Google Webmaster Tools as its own site, and request that that site be removed from the index.
For added protection:
- Make the robots.txt on the live site read only, so when you copy the dev site over you don't accidentally copy over the robots.txt saying to exclude everything
- Set up a code monitor on the robots.txt for both the dev site and the live site that checks the content of those files and alerts you if there are changes. I use https://polepositionweb.com/roi/codemonitor/index.php.
-
Like Daniel said you can use robots.txt to block spiders, but this won't guarantee exclusion of URLs showing up in search results. You could use x-robots-tag in the server headers. Generate a 403 every time user-agent hits the sub domain.
-
I put a .htaccess style password on the development site. If you make a robots.txt to block the site, make sure you don't accidentally put that on the production site.
-
Unfortunately I don't have that option.
-
Just use a directory instead of a sub-domain and then block that directory... that's the easiest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to block index of link and content
Hi, We have pages where articles are shown and in the sides we have small snippets of Articles which shows the title and close to 25 words and a image. When i search for something in Google the snippet image and content is shown and in Google when clicked it redirects to a page which is not meant to be shown for the keyword the visitor is querying Is there a way i can block all the links and content shown in the right and left side of the page so Google does not get confused with the page content thats not related to that page? thanks
On-Page Optimization | | AlexisWithers0 -
Robot.txt file issue on wordpress site.
I m facing the issue with robot.txt file on my blog. Two weeks ago i done some development work on my blog. I just added few pages in robot file. Now my complete site seems to be blocked. I have checked and update the file and still having issue. The search result shows that "A description for this result is not available because of this site's robots.txt – learn more." Any suggestion to over come with this issue
On-Page Optimization | | Mustansar0 -
How can i block the below URLs
Google indexed plugins pages for my website. Please check below. How can stop them to be indexed on google.? http://www.ayurjeewan.com/wp-content/plugins/LayerSlider/static/skins/glass/ http://www.ayurjeewan.com/wp-content/plugins/LayerSlider/static/skins/borderlesslight3d/ http://www.ayurjeewan.com/wp-content/plugins/LayerSlider/static/skins/defaultskin/ My robots.txt file is - User-agent: * Disallow: /wp-admin/
On-Page Optimization | | MasonBaker0 -
Description tag not showing in the SERPs because page is blocked by Robots, but the page isn't blocked. Any help?
While checking some SERP results for a few pages of a site this morning I noticed that some pages were returning this message instead of a description tag, A description for this result is not avaliable because of this site's robot.s.txt The odd thing is the page isn't blocked in the Robots.txt. The page is using Yoast SEO Plugin to populate meta data though. Anyone else had this happen and have a fix?
On-Page Optimization | | mac22330 -
Robots file include sitemap
Hello, I see that google, facebook and moz... have robots.txt include sitemap at the footer.
On-Page Optimization | | JohnHuynh
Eg: http://www.google.com.vn/robots.txt Sitemap: http://www.google.com/sitemaps_webmasters.xml
Sitemap: http://www.google.com/ventures/sitemap_ventures.xml Should I include my sitemap file (sitemap.xml) at the footer of robots.txt and why should do this? Thanks,0 -
How do i block an entire category/directory with robots.txt?
Anyone has any idea how to block an entire product category, including all the products in that category using the robots.txt file? I'm using woocommerce in wordpress and i'd like to prevent bots from crawling every single one of products urls for now. The confusing part right now is that i have several different url structures linking to every single one of my products for example www.mystore.com/all-products, www.mystore.com/product-category, etc etc. I'm not really sure how i'd type it into the robots.txt file, or where to place the file. any help would be appreciated thanks
On-Page Optimization | | bricerhodes0 -
New CMS system - 100,000 old urls - use robots.txt to block?
Hello. My website has recently switched to a new CMS system. Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls. Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical' Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find. My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary. Thanks!
On-Page Optimization | | Blenny0 -
Right way to block google robots from ppc landing pages
What is the right way to completely block seo robots from my adword landing pages? Robots.txt does not work really good for that, as far I know. Adding metatags noindex nofollow on the other side will block adwords robot as well. right? Thank you very much, Serge
On-Page Optimization | | Kotkov0