Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block an entire subdomain with robots.txt?
- 
					
					
					
					
 Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas? 
- 
					
					
					
					
 Awesome! That did the trick -- thanks for your help. The site is no longer listed  
- 
					
					
					
					
 Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again). Best to request removal then wait a few days. 
- 
					
					
					
					
 Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take? I'll try to verify via Webmaster Tools to speed up the process. Thanks 
- 
					
					
					
					
 You should do a remove request in Google Webmaster Tools.  You have to first verify the sub-domain then request the removal. See this post on why the robots file alone won't work... http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts 
- 
					
					
					
					
 Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea. Will report back to confirm that the subdomain has been de-indexed. 
- 
					
					
					
					
 Option 1 could come with a small performance hit if you have a lot of txt files being used on the server. There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant. Good luck 
- 
					
					
					
					
 Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation 
- 
					
					
					
					
 We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings 
- 
					
					
					
					
 Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly.  This means the server has to run all .txt file as (I presume) PHP. Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain RewriteEngine on 
 RewriteCond %{HTTP_HOST} ^subdomain.website.com$
 RewriteRule ^robotx.txt$ robots-subdomain.txtThen add: User-agent: * 
 Disallow: /to the robots-subdomain.txt file (untested) 
- 
					
					
					
					
 Placing canonical tags isn't an option? Â Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain? Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Â Apparently, it's ok to have a canonical tag on a page pointing to itself. Â I haven't tried this, but if Matt Cutts says it's ok... 
- 
					
					
					
					
 Hey Ryan, I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain. Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain. 
- 
					
					
					
					
 They both point to the same location on the server? So there's not a different folder for the subdomain? If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface). Here is what your htaccess might look like: <ifmodule mod_rewrite.c="">RewriteEngine on 
 Â # Redirect non-www to wwww
 Â RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
 Â RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule>
- 
					
					
					
					
 Not to me LOL  I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one. I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.Andy  
- 
					
					
					
					
 Hey Andy, Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file. Does that make sense? 
- 
					
					
					
					
 Hi Kyle  Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content. Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.User-agent: * 
 Disallow: /hope this helps  
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
 If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL? Intermediate & Advanced SEO | | Gabriele_Layoutweb0
- 
		
		
		
		
		
		How can I get Bing to index my subdomain correctly?
 Hi guys, My website exists on a subdomain (i.e. https://website.subdomain.com) and is being indexed correctly on all search engines except Bing and Duck Duck Go, which list 'https://www.website.subdomain.com'. Unfortunately my subdomain isn't configured for www (the domain is out of my control), so searchers are seeing a server error when clicking on my homepage in the SERPs. I have verified the site successfully in Bing Webmaster Tools, but it still shows up incorrectly. Does anyone have any advice on how I could fix this issue? Thank you! Intermediate & Advanced SEO | | cos20300
- 
		
		
		
		
		
		"noindex, follow" or "robots.txt" for thin content pages
 Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here. Intermediate & Advanced SEO | | khi50
- 
		
		
		
		
		
		Moving blog to a subdomain, how can I help it rank?
 Hi all, We recently moved our blog to a sub-domain where it is hosted on Wordpress. It was very recent and we're actively working on the SEO, but any pointers on getting the subdomain to rank higher than the old blog posts would be terrific. Thanks! Intermediate & Advanced SEO | | DigitalMoz0
- 
		
		
		
		
		
		Disavowin a sitewide link that has Thousands of subdomains. What do we tell Google?
 Hello, I have a hosting company that partnered up with a blogger template developer that allowed users to download blog templates and have my footer links placed sitewide on their website. Â Sitewides i know are frowned upon and that's why i went through the rigorous Link Audit months ago and emailed every webmaster who made "WEBSITENAME.Blogspot.com" 3 times each to remove the links. I'm at a point where i have 1000 sub users left that use the domain name of "blogspot.com". Â I used to have 3,000! Question: When i disavow these links in Webmaster tools for Google and Bing, should i upload all 1000 subdomains of "blogspot.com" individually and show Google proof that i emailed all of them individually, or is it wise to just include just 1 domain name (www.blogspot.com) so Google sees just ONE big mistake instead of 1000. This has been on my mind for a year now and I'm open to hearing your intelligent responses. Intermediate & Advanced SEO | | Shawn1240
- 
		
		
		
		
		
		Robots Disallow Backslash - Is it right command
 Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: *Â Â Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site, Intermediate & Advanced SEO | | Modi0
- 
		
		
		
		
		
		Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
 Hi! I have pages within my forum where visitors can upload photos.  When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed.  The industry however is one that really leans on images and having the images in Google Image search is important to us. The url structure is like such:  domain.com/community/photos/~username~/picture111111.aspx I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results.  This would be something like this: User-agent: googlebot Disallow: /community/photos/ Can  I disallow Googlebot specifically rather than just using User-agent:  * which would then allow googlebot-image to pick up the photos?  I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible? Thanks! Leona Intermediate & Advanced SEO | | HD_Leona0
- 
		
		
		
		
		
		Franchise sites on subdomains
 I've been asked by a client to optimise a a webpage for a location i.e. London. Turns out that the location is actually a franchise of the main company. When the company launch a new franchise, so far they have simply added a new page to the main site, for example: mysite.co.uk/sub-folder/london They have so far done this for 10 or so franchises and task someone with optimising that page for their main keyword + location. I think I know the answer to this, but would like to get a back up / additional info on it in terms of ranking / seo benefits. I am going to suggest the idea of using a subdomain for each location, example: london.mysite.co.uk Would this be the correct approach. If you think yes, why? Many thanks, Intermediate & Advanced SEO | | Webrevolve0
 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				