What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
-
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:
staging.domain.com
User-agent: *
Disallow: /in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.
-
Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.
-
I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.
-
Matt/Ryan-
Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.
I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example
-
.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.
-
Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.
I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.
You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.
-
Correct.
-
Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.
The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.
-
Hi Matt.
Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.
-
Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.
-
Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.
http://staging.domin.com/robots.txt
User-agent: *
Disallow: /That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?
-
Use an .htaccess file to only allow from certain ip addresses or ranges.
Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm
-
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Place a robots.txt file in the root of the subdomain.
User-agent: *
Disallow: /This method will block the subdomain while leaving your primary domain unaffected.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help Center/Knowledgebase effects on SEO: Is it worth my time fixing technical issues on no-indexed subdomain pages?
We're a SaaS company and have a pretty extensive help center resource on a subdomain (help.domain.com). This has been set up and managed over a few years by someone with no knowledge of SEO, meaning technical things like 404 links, bad redirects and http/https mixes have not been paid attention to. Every page on this subdomain is set to NOT be indexed in search engines, but we do sometimes link to help pages from indexable posts on the main domain. After spending time fixing problems on our main website, our site audits now flag almost solely errors and issues on these non-indexable help center pages every week. So my question is: is it worth my time fixing technical issues on a help center subdomain that has all its pages non-indexable in search engines? I don't manage this section of the site, and so getting fixes done is a laborious process that requires going through someone else - something I'd rather only do if necessary.
Technical SEO | | mglover19880 -
Old domain (example.com) to (somethingelse.com)
Hi there I'd really appreciate any help you can give me. I want to redirect our old domain (example.com) to (somethingelse.com). They are both hosted separately. The old domain has a domain authority of 20 and never ranked well. We can't be sure Google simply doesn't like the old domain. I'll explore the links again to check. Another question is: do we even want to pass the old authority to the new website? Thank you.
Technical SEO | | kettlebellswing0 -
Site's meta description is not being shown in Google Search results. Instead our privacy policy is getting indexed.
We re-launched our new site and put in the re-directs. Our site is https://www.fico.com/en. When I search for "fico" in Google. I see the privacy policy getting indexed as meta descriptions instead of our actual meta description. I have edited the meta description, requested Google to re-index our site. Not sure what to do next? Thanks for your advise.
Technical SEO | | gosheen0 -
Why Are Some Pages On A New Domain Not Being Indexed?
Background: A company I am working with recently consolidated content from several existing domains into one new domain. Each of the old domains focused on a vertical and each had a number of product pages and a number of blog pages; these are now in directories on the new domain. For example, what was www.verticaldomainone.com/products/productname is now www.newdomain.com/verticalone/products/product name and the blog posts have moved from www.verticaldomaintwo.com/blog/blogpost to www.newdomain.com/verticaltwo/blog/blogpost. Many of those pages used to rank in the SERPs but they now do not. Investigation so far: Looking at Search Console's crawl stats most of the product pages and blog posts do not appear to be being indexed. This is confirmed by using the site: search modifier, which only returns a couple of products and a couple of blog posts in each vertical. Those pages are not the same as the pages with backlinks pointing directly at them. I've investigated the obvious points without success so far: There are a couple of issues with 301s that I am working with them to rectify but I have checked all pages on the old site and most redirects are in place and working There is currently no HTML or XML sitemap for the new site (this will be put in place soon) but I don't think this is an issue since a few products are being indexed and appearing in SERPs Search Console is returning no crawl errors, manual penalties, or anything else adverse Every product page is linked to from the /course page for the relevant vertical through a followed link. None of the pages have a noindex tag on them and the robots.txt allows all crawlers to access all pages One thing to note is that the site is build using react.js, so all content is within app.js. However this does not appear to affect pages higher up the navigation trees like the /vertical/products pages or the home page. So the question is: "Why might product and blog pages not be indexed on the new domain when they were previously and what can I do about it?"
Technical SEO | | BenjaminMorel0 -
How to target similar keywords for Main Category / Sub Categories?
Hi all, This is from an on-site point of view for an ecommerce site - Just looking for a bit of advice about how i can create different pages for similar keywords, by this i mean lets say i have 4 categories: Main Category = High Definition Sub Cat 1 = High Definition Camera Sub Cat 2 = High Definition Recorder Sub Cat 3 = High Definition Kits First lets focus on the Main Category: Would you not want to mention Camera / recorder / kits in any of the main category title / meta tags / h tags etc? From a navigation point of view its impossible not to have those words mentioned there, and from a product point of view obviously the "cameras" are going to be on the main category page also... Obviously we can create some written content also, which i presume again would be best not to mention cameras / recorders / kits? OR would it be wise to mention them, but link to those pages? NOW if we look at a Sub Category - Say 1 (Camera) Now obviously everywhere we type in "high definition camera" we are typeing in the keyword for the main category, so is there anyway to limit the effect of this, so that a sub category wouldnt rank above a main category for its keyword... for example if we were to make sure any time the word "high definition" is mentioned in title / meta / h tags, or any specific written content that the word "camera" is directly after it... Also, perhaps in the content of all three sub categories make sure 1 (or would you advise more) link to the main category using keyword "high definition"? Any advice on the above would be greatly appreciated... Also bare in mind i am talking about on site only, i'm just thinking from a creating the page point of view, i know we can try and force the issue afterwards with a few backlinks etc.. On a different note, a simple question... When you do a site:mysite.com search in google... is the list google then presents you in order of how "important" googls see's the pages on that site? Or is it just hompage then random? thanks James
Technical SEO | | isntworkdull0 -
What is the advantage of using sub domains instead of pages on the root domain?
Have a look at this example http://bannerad.designcrowd.com/ For each category of design, they have a landing page on the sub domain. Wouldn't it be better to have them as part of the same domain? What is the strategy behind using sub domains?
Technical SEO | | designquotes0 -
301 Redirect for 3 Domains into 1 New Domain
So I wanted a quick sanity check on the htaccess syntax for migrating 3 domains into 1 new domain. For example, we're migrating 3 sites abc.com, def.com and ghi.com, all into 1 new site on ghi.com. Here's the htaccess we're placing on the root of ghi.com: redirect 301 http://www.abc.com/wines.html http://www.ghi.com/wines redirect 301 http://www.def.com/trade.html http://www.ghi.com/trade
Technical SEO | | cmaseattle
redirect 301 http://www.ghi.com/winery-tours.html http://www.ghi.com/visit/taste On the DNS side of things, we're parking abc.com and def.com on the ghi.com server. I'm not seeing examples of htaccess files for this scenario, and none that use any domain info on the "from" side of the redirect 301 syntax. Any suggestions before we pull the trigger? Thanks!0 -
How can I prevent duplicate content between www.page.com/ and www.page.com
SEOMoz's recent crawl showed me that I had an error for duplicate content and duplicate page titles. This is a problem because it found the same page twice because of a '/' on the end of one url. e.g. www.page.com/ vs. www.page.com My question is do I need to be concerned about this. And is there anything I should put in my htaccess file to prevent this happening. Thanks!
Technical SEO | | onlineexpression
Karl0