What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
-
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:
staging.domain.com
User-agent: *
Disallow: /in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.
-
Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.
-
I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.
-
Matt/Ryan-
Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.
I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example
-
.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.
-
Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.
I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.
You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.
-
Correct.
-
Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.
The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.
-
Hi Matt.
Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.
-
Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.
-
Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.
http://staging.domin.com/robots.txt
User-agent: *
Disallow: /That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?
-
Use an .htaccess file to only allow from certain ip addresses or ranges.
Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm
-
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Place a robots.txt file in the root of the subdomain.
User-agent: *
Disallow: /This method will block the subdomain while leaving your primary domain unaffected.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Why My site pages getting video index viewport issue?
Hello, I have been publishing a good number of blogs on my site Flooring Flow. Though, there's been an error of the video viewport on some of my articles. I have tried fixing it but the error is still showing in Google Search Console. Can anyone help me fix it out?
Technical SEO | | mitty270 -
301 Domain Redirect from old domain with HTTPS
My domain was indexed with HTTPS://WWW. now that we redirected it the certificate has been removed and if you try to visit the old site with https it throws an obvious error that this sites not secure and the 301 does not happen. My question is will googles bot have this issue. Right now the domain has been in redirection status to the new domain for a couple months and the old site is still indexed, while the new one is not ranking well for half its terms. If that is not causing the problem can anyone tell me why would the 301 take such a long time. Ive double and quadruple checked the 301's and all settings to ensure its being redirected properly. Yet it still hasn't fully redirected. Something is wrong and my clients ready to ditch the old domain we worked on for a good amount of time. backgorund:About 30 days ago we found some redirect loops .. well not loop but it was redirecting from old domain to the new domain several times without error. I removed the plugins causing the multi redirects and now we have just one redirect from any page on the old domain to the new https version. Any suggestions? This is really frustrating me and I just can't figure it out. My only answer at this point is wait it out because others have had this issue where it takes up to 2 months to redirect the domain. My only issue is that this is the first domain redirect out of many that have ever taken more than a week or three.
Technical SEO | | waqid0 -
Blog page won't get indexed
Hi Guys, I'm currently asked to work on a website. I noticed that the blog posts won't get indexed in Google. www.domain.com/blog does get indexed but the blogposts itself won't. They have been online for over 2 months now. I found this in the robots.txt file: Allow: / Disallow: /kitchenhandle/ Disallow: /blog/comments/ Disallow: /blog/author/ Disallow: /blog/homepage/feed/ I'm guessing that the last line causes this issue. Does anyone have an idea if this is the case and why they would include this in the robots.txt? Cheers!
Technical SEO | | Happy-SEO2 -
When doing internal linking back to your home/index file what is the best coding course of action?
When doing internal linking back to your home/index page is it best to set the code as linked to "www.thedomain.com" or "www.thedomain.com/" or just "/" - I'm attempting some canonicalization and our programmer is concerned about linking to just the URL as he's saying it's going to be viewed as an external source. We have www redirects in place that come back to just www.thedomain.com and a redirect to send the www.thedomain.com/index.php back to just www.thedomain.com . Any help would be appreciated, thank you!
Technical SEO | | CharlesDaniels0 -
Can someone help me get this site ranked? www.2sponsors.com
Hi, I am have been trying for months to get a site ranked for one of my customers and I am not doing very well. I have been doing SEO for years and have gotten lots of sites ranked but this one has been the most difficult. Does anyone have time to look at it for me? Thanks The sites PR=4. I am trying to get it ranked in www.google.com.ar Thanks Carla skype: carla.dawson78
Technical SEO | | Carla_Dawson0 -
How to get Google to index another page
Hi, I will try to make my question clear, although it is a bit complex. For my site the most important keyword is "Insurance" or at least the danish variation of this. My problem is that Google are'nt indexing my frontpage on this, but are indexing a subpage - www.mydomain.dk/insurance instead of www.mydomain.dk. My link bulding will be to subpages and to my main domain, but i wont be able to get that many links to www.mydomain.dk/insurance. So im interested in making my frontpage the page that is my main page for the keyword insurance, but without just blowing the traffic im getting from the subpage at the moment. Is there any solutions to do this? Thanks in advance.
Technical SEO | | Petersen110 -
A rel="canonical" to www.homepage.com/home.aspx Hurts my Rank?
Hello, The CMS that I use makes 3 versions of the homepage:
Technical SEO | | EvolveCreative
www.homepage.com/home.aspx homepage.com homepage.com/default.aspx By default the CMS is set to rel=canonical all versions to the www.homepage.com/home.aspx version. If someone were to link to a website they most likely aren't going to link to www.homepage.com/home.aspx, they'll link to www.homepage.com which makes that link juice flow through the canonical to www.homepage.com/home.aspx right? Why make that extra loop at all? Wouldn't that be splitting the juice? I know 301's loose 1-5 % juice, but not sure about canonical. I assume it works the same way? Thanks! http://yoursiteroot/0 -
Correct 301 of domain inclusive "/"
Do I have to redirect "/" in the domain by default? My root domain is e.g. petra.at
Technical SEO | | petrakraft
--> I redirect via 301 to www.petra.at Do I have to do that with petra.at/ and www.petra.at/, too?0