Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed
-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.
Andy
-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.
User-agent: *
Disallow: /hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Name as displayed in serp is a completely random company in an entirely different industry!
Hello! I am looking at a site that has its site name displayed incorrectly in the SERPS. The site name is showing as a company from the airline industry while the site in question provides hypnotherapy services. This issue is on every page except the homepage. I have checked source code and the structured data. Local business structured data with the correct company name - I can't find anything at all in the source code that refers to this random airline! Thank you for reading this and for any ideas or suggestions! Here is a screenshot that illustrates the issueUntitled.jpg
Intermediate & Advanced SEO | | sean.e0 -
Will deindexing a subdomain negate the benefit of backlinks leading to that subdomain?
My client has a subdomain from their main site where their online waiver tool lives. Currently, all the waivers generated by users are creating indexed pages, I feel they should deindex that subdomain entirely. However, a lot of their backlinks are from their clients linking to their waivers. If they end up deindexing their subdomain, will they lose the SEO benefit of backlinks pointing to that subdomain? Thanks! Jay
Intermediate & Advanced SEO | | MCC_DSM0 -
Changing URL to a subdomain?
Hi there, I had a website www.footballshirtcollective.com that has been live since July. It contains both content and eCommerce. I am now separating out the content so that; 1. The master domain is www.footballshirtcollective.com (content) pointing to a new site 2. Subdomain is store.footballshirtcollective.com (ecommerce) - pointing to the existing site. What do you advise I can do to minimise the impact on my search? Many thanks Mike
Intermediate & Advanced SEO | | mjmaxwell0 -
Block Googlebot from submit button
Hi, I have a website where many searches are made by the googlebot on our internal engine. We can make noindex on result page, but we want to stop the bot to call the ajax search button - GET form (because it pass a request to an external API with associate fees). So, we want to stop crawling the form button, without noindex the search page itself. The "nofollow" tag don't seems to apply on button's submit. Any suggestion?
Intermediate & Advanced SEO | | Olivier_Lambert0 -
Changing a subdomain to a full domain to rank for a keyword
We have been attempting to get our blogsite to rank for our business name(Instabill). We are now considering changing the url from blog.instabill.com to something like instabillblog.com. I have following concerns about the change; Will changing the domain really be that helpful (i.e. will the change get our blog on page one for the term instabill) We have over 350 pages of content on our blog. Will changing the domain have possible negative effects ( I was thinking of using url updater in webmaster tools and creating a permanent 301 redirect from the older url to the new) Having never changed a url for a site with this much content and seo value for my company I would like to know the following from someone who has made mistakes here before; what not to do what steps you would take to make the transition easier Any help here will be greatly appreciated. cheers, Instabill
Intermediate & Advanced SEO | | Instabill0 -
Subdomains for US Regions
The company I work for is expanding their business to new territories. I've got a lot of stabilization to do in the region/state where we're one of the most well known companies of our kind. Currently, we have 3 distinct product lines which are currently distinguished by 3 separate URLS. This is affecting the user flow of our site, so we'd like to clean it up before launching our products into the various regions. The business has decided to grow into 5 new states (one state consisting of one county only) — none of which will feature all 3 products. Our homebase state is the only one that will have all 3 products this year. My initial thought was to use subdomains to separate out the regions, that way we could use a canonical tag to stabilize the root domain (which would feature home state content, and support content for all regions), and remove us from potential duplicate content penalization. Our product content will be nearly identical across the regions for the first year. I second guessed myself by thinking that it was perhaps better to use a "[product].root/region" URL instead. And I'm currently stuck by wondering if it was not better to build out subdomains for products and regions...using one modifier or the other as a funnel/branding page into the other. For instance, user lands on "region.root.com" and sees exactly what products we offer in that region. Basically, a tailored landing page. Meanwhile the bulk of the product content would actually live under "product.root.com/region/page". My head is spinning. And while searching for similar questions I also bumped into reference of another tag meant to be used in some similar cases to mine. I feel like there's a lot of risks involved in this subdomain strategy, but I also can't help but see the benefits in the user flow.
Intermediate & Advanced SEO | | taylor.craig0 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
Does Blocking ICMP Requests Affect SEO?
All in the title really. One of our clients came up with errors with a server header check, so I pinged them and it times out. The hosting company have told them that it's because they're blocking ICMP requests and this doesn't affect SEO at all... but I know that sometimes pinging posts, etc... can be beneficial so is this correct? Thanks, Steve.
Intermediate & Advanced SEO | | SteveOllington0