Robots.txt disallow subdomain
-
Hi all,
I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only?
Thanks in advance!
-
I would suggest you talk to the developers as Theo suggests to exclude visitors from your test site.
-
The copying is a manual process and I don't want any risks for the live environment. A Httphandler for robots.txt could be a solution and I'm going to discuss this with one of our developers. Other suggestions are still welcome of course!
-
Do you ftp copy one domain to the other? If this is a manual process the excluding the robots.txt that is on the test domain would be as simple as excluding it.
If you automate the copy and want code to function based on base url address then you could create a Httphandler for robots.txt that delivered a different version based on the request url host in the http request header.
-
You could use enviromental variables (for example in your env.ini or config.ini file) that are set to DEVELOPMENT, STAGING, or LIVE based on the appropriate environments the code finds itself in.
With the exact same code, your website would either be limiting IP addresses (on the development environment) or allow all IP addresses (in the live environment). With this setup you can also set different variables per environment such as the level of detail that is shown in your error reporting, connect to a testing database rather than a live one, etc.
[this was supposed to be a reply, but I accidentely clicked the wrong button. Hitting 'Delete reply' results in an error.]
-
Thanks for your quick reply, Theo. Unfortunately, this htpasswd will also get copied to the live environment, so our websites will get password protected live. Could there be any other solution for this?
-
I'm sure there is, but I'm guessing you don't want any human visitors to go to your development subdomain and view what is being done there as well? I'd suggest you either limit the visitors that have access by IP address (thereby effectively blocking out Google in one move) and/or implement a .htpasswd solution where developers can log in with their credentials to your development area (which blocks out Google as well).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Backlink from same domain but different subdomain? any juice here?
will i be able to get the link juice from same domain but different subdomain, if I have a backlink lets say there is a website, which is featuring my topic on multiple subdomain any benefit? or it will considered one link?
Intermediate & Advanced SEO | | SIMON-CULL0 -
Robots.txt Blocking - Best Practices
Hi All, We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line. We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files? Thanks! User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
Intermediate & Advanced SEO | | ReunionMarketing0 -
Our main domain has thousands of subdomains with same content (expired hosting), how should we handle it?
Hello, Our client allows users to create free-trial subdomains and once the trial expires, all the domains have the same page. If people stick, their own websites are hosted on the subdomain. Since all these expired trials subdomains have the same content and are linking towards the Homepage, should they be nofollows? Has anyone dealt with something similar? Thanks very much in advance,
Intermediate & Advanced SEO | | SCAILLE0 -
Blocking out specific URLs with robots.txt
I've been trying to block out a few URLs using robots.txt, but I can't seem to get the specific one I'm trying to block. Here is an example. I'm trying to block something.com/cats but not block something.com/cats-and-dogs It seems if it setup my robots.txt as so.. Disallow: /cats It's blocking both urls. When I crawl the site with screaming flog, that Disallow is causing both urls to be blocked. How can I set up my robots.txt to specifically block /cats? I thought it was by doing it the way I was, but that doesn't seem to solve it. Any help is much appreciated, thanks in advance.
Intermediate & Advanced SEO | | Whebb0 -
Meta NoIndex tag and Robots Disallow
Hi all, I hope you can spend some time to answer my first of a few questions 🙂 We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
Intermediate & Advanced SEO | | bjs2010
"There is no information about this page because it is blocked by robots.txt" So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”. I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B0 -
Links from a website or a subdomain, which would generate more benefits in terms of SEO?
I have a customer who just bought a domain (and the full website) of a competitor and decided that they will no longer update the website purchased. The website of my client has a Domain Authority = 50 and DA of the website purchased is 45. Each of them was registered by different companies and are on different servers too.
Intermediate & Advanced SEO | | marciofelias
The reason for my message is that by being registered by different companies and are in differrent servers I can use the site purchased as a way to make link building to the main website (one way link buiding only from the website purchased to the main website), but I can put the website purchased as a subdomain of the main website and agregate content to the main website.
In your opinion which would generate more benefits in terms of SEO to the main website? Links from the website purchased or put this website as a subdomain to the main website?0 -
Domain vs Subdomain for Multi-Location Practice
I have a client who has 2 locations (Orlando & Tampa) and would like to keep the current domain for both locations (DA 29). We want to target additional cities within each service area (Orlando & Tampa). Each service area would target 2 cities on the main pages and 4-5 cities with "SEO" pages which contains unique content specific to the given city. Would I be better off creating sub domains (www.orlando.domain.com & www.tampa.domain.com), creating subfolders (www.domain.com/orlando, etc) or keeping the domain as is and create SEO pages specific to each city? We want to spread the domain authority to both locations.
Intermediate & Advanced SEO | | Red_Spot_Interactive0 -
Large volume of ning files in subdomain - hurting or helping?
I have a client that has 600 pages in their root domain and a subdomain that contains 7500 pages of un-seoable Ning pages. PLUS another 650 pages from Sched.com that also is contributing to a large volume of errors. My question is - should I create a new domain for the Ning content - or am I better off with the volume of pages - even if they have loads of errors? Thanks!
Intermediate & Advanced SEO | | robertdonnell0