Robots.txt disallow subdomain
-
Hi all,
I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only?
Thanks in advance!
-
I would suggest you talk to the developers as Theo suggests to exclude visitors from your test site.
-
The copying is a manual process and I don't want any risks for the live environment. A Httphandler for robots.txt could be a solution and I'm going to discuss this with one of our developers. Other suggestions are still welcome of course!
-
Do you ftp copy one domain to the other? If this is a manual process the excluding the robots.txt that is on the test domain would be as simple as excluding it.
If you automate the copy and want code to function based on base url address then you could create a Httphandler for robots.txt that delivered a different version based on the request url host in the http request header.
-
You could use enviromental variables (for example in your env.ini or config.ini file) that are set to DEVELOPMENT, STAGING, or LIVE based on the appropriate environments the code finds itself in.
With the exact same code, your website would either be limiting IP addresses (on the development environment) or allow all IP addresses (in the live environment). With this setup you can also set different variables per environment such as the level of detail that is shown in your error reporting, connect to a testing database rather than a live one, etc.
[this was supposed to be a reply, but I accidentely clicked the wrong button. Hitting 'Delete reply' results in an error.]
-
Thanks for your quick reply, Theo. Unfortunately, this htpasswd will also get copied to the live environment, so our websites will get password protected live. Could there be any other solution for this?
-
I'm sure there is, but I'm guessing you don't want any human visitors to go to your development subdomain and view what is being done there as well? I'd suggest you either limit the visitors that have access by IP address (thereby effectively blocking out Google in one move) and/or implement a .htpasswd solution where developers can log in with their credentials to your development area (which blocks out Google as well).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Backlinks to less important subdomain
We have two subdomains on our site: blogs and *www. Our most important and competitive pages are on the www subdomain. I have some pages on the blogs subdomain that have valuable backlinks. Would it be helpful to our SEO efforts for the www subdomain to move and redirect those pages on the blogs subdomain to the www subdomain?
Intermediate & Advanced SEO | | RCF0 -
Subdomain vs totally new domain
Problem: Our organization publish maps for public viewing using google maps. We are currently getting limited value from these links. We need to separate our public and private maps for infrastructure purposes, and are weighing up the strengths and weaknesses of separating by domain or sub domain with regards SEO and infrastructure. Current situation: maps.mycompany.com currently has a page authority of 30 and mycompany.com has a domain authority of 39. We are currently only getting links from 8 maps which are shared via social media whereas most people embed our maps on their website using an iframe which I believe doesn't do us any favour with SEO. We currently have approx 3K public maps. Question: What SEO impact can you see if we move our public maps from the subdomain maps.mycompany.com to mycompanypublicmaps.com? Thanks in advance for your help and happy to give more info if you need it!
Intermediate & Advanced SEO | | eSpatial0 -
Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
A company has a TLD (top-level-domain) which every single product: company.com/product/name.html The company also has subdomains (tailored to a range of products) which lists a choosen selection of the products from the TLD - sort of like a feed: subdomain.company.com/product/name.html The content on the TLD & subdomain product page are exactly the same and cannot be changed - CSS and HTML is slightly differant but the content (text and images) is exactly the same! My concern (and rightly so) is that Google will deem this to be duplicate content, therfore I'm going to have to add a rel cannonical tag into the header of all subdomain pages, pointing to the original product page on the TLD. Does this sound like the correct thing to do? Or is there a better solution? Moving on, not only are products fed onto subdomain, there are a handfull of other domains which list the products - again, the content (text and images) is exactly the same: other.com/product/name.html Would I be best placed to add a rel cannonical tag into the header of the product pages on other domains, pointing to the original product page on the actual TLD? Does rel cannonical work across domains? Would the product pages with a rel cannonical tag in the header still rank? Let me know if there is a better solution all-round!
Intermediate & Advanced SEO | | iam-sold0 -
We are switching our CMS local pages from a subdomain approach to a subfolder approach. What's the best way to handle this? Should we redirect every local subdomain page to its new subfolder page?
We are looking to create a new subfolder approach within our website versus our current subdomain approach. How should we go about handling this politely as to not lose everything we've worked on up to this point using the subdomain approach? Do we need to redirect every subdomain URL to the new subfolder page? Our current local pages subdomain set up: stores.websitename.com How we plan on adding our new local subfolder set-up: websitename.com/stores/state/city/storelocation Any and all help is appreciated.
Intermediate & Advanced SEO | | SEO.CIC0 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Blocking out specific URLs with robots.txt
I've been trying to block out a few URLs using robots.txt, but I can't seem to get the specific one I'm trying to block. Here is an example. I'm trying to block something.com/cats but not block something.com/cats-and-dogs It seems if it setup my robots.txt as so.. Disallow: /cats It's blocking both urls. When I crawl the site with screaming flog, that Disallow is causing both urls to be blocked. How can I set up my robots.txt to specifically block /cats? I thought it was by doing it the way I was, but that doesn't seem to solve it. Any help is much appreciated, thanks in advance.
Intermediate & Advanced SEO | | Whebb0 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12