What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
-
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:
staging.domain.com
User-agent: *
Disallow: /in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.
-
Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.
-
I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.
-
Matt/Ryan-
Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.
I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example
-
.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.
-
Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.
I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.
You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.
-
Correct.
-
Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.
The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.
-
Hi Matt.
Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.
-
Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.
-
Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.
http://staging.domin.com/robots.txt
User-agent: *
Disallow: /That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?
-
Use an .htaccess file to only allow from certain ip addresses or ranges.
Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm
-
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Place a robots.txt file in the root of the subdomain.
User-agent: *
Disallow: /This method will block the subdomain while leaving your primary domain unaffected.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I am looking for best way to block a domain from getting indexed ?
We have a website http://www.example.co.uk/ which leads to another domain (https://online.example.co.uk/) when a user clicks,in this case let us assume it to be Apply now button on my website page. We are getting meta data issues in crawler errors from this (https://online.example.co.uk/) domain as we are not targeting any meta content on this particular domain. So we are looking to block this domain from getting indexed to clear this errors & does this effect SERP's of this domain (**https://online.example.co.uk/) **if we use no index tag on this domain.
Technical SEO | | Prasadgotteti0 -
Sub Domain Redirect
Hey Everyone, Here is the situation : Currently, a website's sub domain is being redirected to the main website home page. We're having issues getting the sub domain pages indexed. Just want to confirm that it is because of the redirect on the sub domain URL. Should we kill the sub domain redirect and set it up as it's own page? Will that solve the indexing issue for the sub domain pages. More explanation below: subdomain.domain.com currently redirects to domain.com We're having issues indexing pages belonging to the sub domain ( subdomain.url.com/page1 or subdomain.url.com/page2) Appreciate your input in advance. Cheers,
Technical SEO | | SEO5Team0 -
Moving multiple domains into one domain
Hi, We're currently moving a group of websites (approximately 12) under one domain so we've moved from www.example.de , www.example.co.uk , www.example.com to www.example.com/de www.example.com/uk and so on. However I have read an article online today saying that this can lead to crawling complications. Has anyone done something similar and if there were any issues how did you overcome them? Many thanks
Technical SEO | | Creditsafe0 -
URLs in Greek, Greeklish or English? What is the best way to get great ranking?
Hello all, I am Greek and I have a quite strange question for you. Greek characters are generally recognized as special characters and need to have UTF-8 encoding. The question is about the URLs of Greek websites. According the advice of Google webmasters blog we should never put the raw greek characters into the URL of a link. We always should use the encoded version if we decide to have Greek characters and encode them or just use latin characters in the URL. Having Greek characters un-encoded could likely cause technical difficulties with some services, e.g. search engines or other url-processing web pages. To give you an example let's look at A) http://el.wikipedia.org/wiki/%CE%95%CE%BB%CE%B2%CE%B5%CF%84%CE%AF%CE%B1which is the URL with the encoded Greek characters and it shows up in the browser asB) http://el.wikipedia.org/wiki/Ελβετία The problem with A is that everytime we need to copy the URL and paste it somewhere (in an email, in a social bookmark site, social media site etc) the URL appears like the A, plenty of strange characters and %. This link sometimes may cause broken link issues especially when we try to submit it in social networks and social bookmarks. On the other hand, googlebot reads that url but I am wondering if there is an advantage for the websites who keep the encoded URLs or not (in compairison to the sites who use Greeklish in the URLs)! So the question is: For the SEO issues, is it better to use Greek characters (encoded like this one http://el.wikipedia.org/wiki/%CE%95%CE%BB%CE%B2%CE%B5%CF%84%CE%AF%CE%B1) in the URLs or would it be better to use just Greeklish (for example http://el.wikipedia.org/wiki/Elvetia ? Thank you very much for your help! Regards, Lenia
Technical SEO | | tevag0 -
301 Redirect / cross-domain canonical to a URL w/ Ampersand
I have a question regarding ampersands, we are needing to redirect to a URL w/ an ampersand in the URL: http://local.sfgate.com/b18915250/Sam-&-Associates-Insurance-Agency Will Google pass page authority/juice despite the fact that there is an ampersand in the URL, if we were to 301 redirect or cross-domain canonical to the url? Should we 301 redirect to http://local.sfgate.com/b18915250/Sam-%26-Associates-Insurance-Agency instead of http://local.sfgate.com/b18915250/Sam-&-Associates-Insurance-Agency? I don't have the option of removing the ampersand Thank you for your time!
Technical SEO | | Gatelist0 -
Any one worked on a sites.google.com/ website ?
Hi I have a client that has a sites.google.com/ website, Has anyone ever used one ? or had to do SEO on one ? Any help would be very much appreciated Thanks
Technical SEO | | tempowebdesign0 -
Page and domain authority 1/100
I have a fairly new domain less than year old showing page and domain authorities of 1/100 ose also will not perform a backlinks check. I have only just started to build back links so maybe I should be waiting a while yet? The site is indexed by google but I cannot see it in the top 100 how can I find out exactly where it is in the serps for example 1001 without manually crawling through the pages.
Technical SEO | | dynamic080 -
Getting More Pages Indexed
We have a large E-commerce site (magento based) and have submitted sitemap files for several million pages within Webmaster tools. The number of indexed pages seems to fluctuate, but currently there is less than 300,000 pages indexed out of 4 million submitted. How can we get the number of indexed pages to be higher? Changing the settings on the crawl rate and resubmitting site maps doesn't seem to have an effect on the number of pages indexed. Am I correct in assuming that most individual product pages just don't carry enough link juice to be considered important enough yet by Google to be indexed? Let me know if there are any suggestions or tips for getting more pages indexed. syGtx.png
Technical SEO | | Mattchstick0