Robots.txt disallow subdomain
-
Hi all,
I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only?
Thanks in advance!
-
I would suggest you talk to the developers as Theo suggests to exclude visitors from your test site.
-
The copying is a manual process and I don't want any risks for the live environment. A Httphandler for robots.txt could be a solution and I'm going to discuss this with one of our developers. Other suggestions are still welcome of course!
-
Do you ftp copy one domain to the other? If this is a manual process the excluding the robots.txt that is on the test domain would be as simple as excluding it.
If you automate the copy and want code to function based on base url address then you could create a Httphandler for robots.txt that delivered a different version based on the request url host in the http request header.
-
You could use enviromental variables (for example in your env.ini or config.ini file) that are set to DEVELOPMENT, STAGING, or LIVE based on the appropriate environments the code finds itself in.
With the exact same code, your website would either be limiting IP addresses (on the development environment) or allow all IP addresses (in the live environment). With this setup you can also set different variables per environment such as the level of detail that is shown in your error reporting, connect to a testing database rather than a live one, etc.
[this was supposed to be a reply, but I accidentely clicked the wrong button. Hitting 'Delete reply' results in an error.]
-
Thanks for your quick reply, Theo. Unfortunately, this htpasswd will also get copied to the live environment, so our websites will get password protected live. Could there be any other solution for this?
-
I'm sure there is, but I'm guessing you don't want any human visitors to go to your development subdomain and view what is being done there as well? I'd suggest you either limit the visitors that have access by IP address (thereby effectively blocking out Google in one move) and/or implement a .htpasswd solution where developers can log in with their credentials to your development area (which blocks out Google as well).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Twitter Robots.TXT
Hello Moz World, So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use: Useragent: * Disallow: But, they address each search engine. Is there any benefit to this? Thanks for all of the awesome responses!!! B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Robots.txt Allowed
Hello all, We want to block something that has the following at the end: http://www.domain.com/category/product/some+demo+-text-+example--writing+here So I was wondering if doing: /*example--writing+here would work?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Recovering old disallow file?
Hi guys, We had aN SEO agency do a disallow request on one of our sites a while back. They have no trace of the disallow txt file and all the links they disallowed. Does anyone know if there is a way to recover this file in google webmaster tools or anyway to find which links were disallowed? Cheers.
Intermediate & Advanced SEO | | jayoliverwright0 -
Root Domain v Subdomain
Hi, Just doing some analysis on a domain, and the (external) linking root domains show as:
Intermediate & Advanced SEO | | bjs2010
21 to Root Domain
4 to Subdomain The site is hosted under the www. subdomain version and there is no 301 from domain to www.domain Should the site be: Hosted on the root domain instead of subdomain 301 all incoming requests on domain to point to www.domain (subdomain) Any comments and experience on this type of situation appreciated!0 -
Robots.txt & Duplicate Content
In reviewing my crawl results I have 5666 pages of duplicate content. I believe this is because many of the indexed pages are just different ways to get to the same content. There is one primary culprit. It's a series of URL's related to CatalogSearch - for example; http://www.careerbags.com/catalogsearch/result/index/?q=Mobile I have 10074 of those links indexed according to my MOZ crawl. Of those 5349 are tagged as duplicate content. Another 4725 are not. Here are some additional sample links: http://www.careerbags.com/catalogsearch/result/index/?dir=desc&order=relevance&p=2&q=Amy
Intermediate & Advanced SEO | | Careerbags
http://www.careerbags.com/catalogsearch/result/index/?color=28&q=bellemonde
http://www.careerbags.com/catalogsearch/result/index/?cat=9&color=241&dir=asc&order=relevance&q=baggallini All of these links are just different ways of searching through our product catalog. My question is should we disallow - catalogsearch via the robots file? Are these links doing more harm than good?0 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Robots.txt unblock
I'm currently having trouble with what appears to be a cached version of robots.txt. I'm being told via errors in my Google sitemap account that I'm denying Googlebot access to the entire site. I uploaded clean and "Allow" robots.txt yesterday, but receive the same error. I've tried "Fetch as Googlebot" on the index and other pages, but still the error. Here is the latest: | Denied by robots.txt |
Intermediate & Advanced SEO | | Elchanan
| 11/9/11 10:56 AM | As I said, there in not blocking on the robots.txt for 24 hours. HELP!0 -
How to get subdomains to rank well?
Hi All, I am setting up a new site and I want to make use of subdomains to target multiple countries as follows: uk.mydomain.com us.mydomain.com australia.mydomain.com etc. Now i know what you're all going to say, why not use folders as they are more effective. Well I did think of this but decided against it because I would like to make the best of a low competition industry. I want to push my competitors as far down in the SE's as possible and i plan to do this by targeting generic non locational search terms with both sites so I can hog the top 4 spots.as follows: www.mydomain.com www.mydomain.com/keyterm uk.mydomain.com uk.mydomain.com/keyterm-in-the-UK Whats steps can I take to ensure rank passes to my subdomains? Is it better to start the site with folders like www.mydomain.com/us/keyterm and then 301 them to subdomains at a later stage or should i start with the subdomains?
Intermediate & Advanced SEO | | Mulith1