Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed
-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.
Andy
-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.
User-agent: *
Disallow: /hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can my affiliate subdomain hurt in any way?
Hello everyone, My main website is: http://www.virtualsheetmusic.com Whereas the above site's related "affiliate" website is located on the subdomain below: http://affiliates.virtualsheetmusic.com I was wondering if having that "affiliate section" on a subdomain could affect the main website negatively in some way... or would be better to put it in a sub-folder on the main website, or even on a totally different domain. Thanks in advance for any advice!
Intermediate & Advanced SEO | | fablau0 -
Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
Our parent company has included their sitemap links in our robots.txt file. All of their sitemap links are on a different domain and I'm wondering if this will have any impact on our searchability or potential rankings.
Intermediate & Advanced SEO | | tsmith1310 -
How much SEO damage would it do having a subdomain site rather directory site?
Hi all! With a coleague we were arguing about what is better: Having a subdomain or a directory.
Intermediate & Advanced SEO | | Gaston Riera
Let me explain some more, this is about the cases: Having a multi-language site: Where en.domain.com or es.domain.com rather than domain.com/en/ or domain.com/es/ Having a Mobile and desktop version: m.domain.com or domain.com rather than domain.com/m or just domain.com. Having multiple location websites, you might figure. The dicussion started with me saying: Its better to have a directory site.
And my coleague said: Its better to have a subdomain site. Some of the reasons that he said is that big companies (such as wordpress) are doing that. And that's better for the business.
My reasons are fully based on this post from Rand Fishkin: Subdomains vs. Subfolders, Rel Canonical vs. 301, and How to Structure Links for SEO - Whiteboard Friday So, what does the community have to say about this?
Who should win this argue? GR.0 -
Blog - subdomain vs. subfolderq
Hi everyone I work on an ecommerce site and I'm trying to get more content together for the site & blog. The development team want to put the blog we have on a subdomain of our site, my question is - what is better for SEO Subfolder vs. subdomain I've read a couple of articles to say subfolder is better and a subdomain needs a lot of management to build up authority itself? Thanks!
Intermediate & Advanced SEO | | BeckyKey0 -
Cookieless subdomains Vs SEO
We have one .com that has all our unique content and then 25 other ccltd sites that are translated versions of the .com for each country we operate in. They are not linked together but we have href lang'd it all together. We now want to serve up all static content of our global website (26 local country sites, .com, .co.uk, .se, etc) from one cookie-less subdomain. Benefit is speed improvement. The question is whether from an SEO perspective, can all static content come from static.domain.com or should we do one for each ccltd where it would come form static.domain.xx (where xx is localised to the domain in question)
Intermediate & Advanced SEO | | aires-fb770 -
Issue with Robots.txt file blocking meta description
Hi, Can you please tell me why the following error is showing up in the serps for a website that was just re-launched 7 days ago with new pages (301 redirects are built in)? A description for this result is not available because of this site's robots.txt – learn more. Once we noticed it yesterday, we made some changed to the file and removed the amount of items in the disallow list. Here is the current Robots.txt file: # XML Sitemap & Google News Feeds version 4.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/ Sitemap: http://www.website.com/sitemap.xml Sitemap: http://www.website.com/sitemap-news.xml User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Other notes... the site was developed in WordPress and uses that followign plugins: WooCommerce All-in-One SEO Pack Google Analytics for WordPress XML Sitemap Google News Feeds Currently, in the SERPs, it keeps jumping back and forth between showing the meta description for the www domain and showing the error message (above). Originally, WP Super Cache was installed and has since been deactivated, removed from WP-config.php and deleted permanently. One other thing to note, we noticed yesterday that there was an old xml sitemap still on file, which we have since removed and resubmitted a new one via WMT. Also, the old pages are still showing up in the SERPs. Could it just be that this will take time, to review the new sitemap and re-index the new site? If so, what kind of timeframes are you seeing these days for the new pages to show up in SERPs? Days, weeks? Thanks, Erin ```
Intermediate & Advanced SEO | | HiddenPeak0 -
About robots.txt for resolve Duplicate content
I have a trouble with Duplicate content and title, i try to many way to resolve them but because of the web code so i am still in problem. I decide to use robots.txt to block contents that are duplicate. The first Question: How do i use command in robots.txt to block all of URL like this: http://vietnamfoodtour.com/foodcourses/Cooking-School/
Intermediate & Advanced SEO | | magician
http://vietnamfoodtour.com/foodcourses/Cooking-Class/ ....... User-agent: * Disallow: /foodcourses ( Is that right? ) And the parameter URL: h
ttp://vietnamfoodtour.com/?mod=vietnamfood&page=2
http://vietnamfoodtour.com/?mod=vietnamfood&page=3
http://vietnamfoodtour.com/?mod=vietnamfood&page=4 User-agent: * Disallow: /?mod=vietnamfood ( Is that right? i have folder contain module, could i use: disallow:/module/*) The 2nd question is: Which is the priority " robots.txt" or " meta robot"? If i use robots.txt to block URL, but in that URL my meta robot is "index, follow"0 -
Subdirectory vs. Subdomain
I work for a large franchise organization that is weighing the pros and cons of using subdomains versus subdirectories for our franchisee locations. What are the pros and cons of each approach?
Intermediate & Advanced SEO | | Glassdoctordfw0