Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed
-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.
Andy
-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.
User-agent: *
Disallow: /hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
block primary . xxx domain with disavow tool
Hi friends I discovered spam url attack on top sites with good google positions for specific keyword. Can I block primary domain like .xxx with disavow tool? There is hundreds of different domains but primary domain is always the same. for example like this domain:xxx? Thanks
Intermediate & Advanced SEO | | netcomsia0 -
Is Link equity / Link Juice lost to a blocked URL in the same way that it is lost to nofollow link
Hi If there is a link on a page that goes to a URL that is blocked in robots txt - is the link juice lost in the same way as when you add nofollow to a link on a page. Any help would be most appreciated.
Intermediate & Advanced SEO | | Andrew-SEO0 -
Robots.txt advice
Hey Guys, Have you ever seen coding like this in a robots.txt, I have never seen a noindex rule in a robots.txt file before - have you? user-agent: AhrefsBot User-agent: trovitBot
Intermediate & Advanced SEO | | eLab_London
User-agent: Nutch
User-agent: Baiduspider
Disallow: / User-agent: *
Disallow: /WebServices/
Disallow: /*?notfound=
Disallow: /?list=
Noindex: /?*list=
Noindex: /local/
Disallow: /local/
Noindex: /handle/
Disallow: /handle/
Noindex: /Handle/
Disallow: /Handle/
Noindex: /localsites/
Disallow: /localsites/
Noindex: /search/
Disallow: /search/
Noindex: /Search/
Disallow: /Search/
Disallow: ? I have never seen a noindex rule in a robots.txt file before - have you?
Any pointers?0 -
Keyword phrase for entire site
Hey everyone! I'm fairly new to SEO but I have a large number of sites I'm needing to SEO. I'm a tad confused as to how many keyword phrases I should use throughout my site. For example, my site is www.uluru.travel. I want to rank highlight for the phrase 'uluru tours' throughout the site, as many of my pages list uluru tours and people searching for this phrase are my type of customers. As you can see I've tried to do some basic on page SEO for that phrase by including it in page title, headings etc. But the entire site doesn't seem to rank very well. Would you guys suggest trying to target 'uluru tours' phrase throughout the entire site of just focus a couple of pages on this term? Any advice is greatly appreciated guys! Cheers
Intermediate & Advanced SEO | | Mysites0 -
HTTPS entire domain Vs. one URL
Long time no Moz! Ive been away with some server related issues, installing an AD at the company I work for, but I'm back. Our SSL cert just expired and I'm trying to determine the pros and cons of making an entire site SSL vs just the URL. Our previous set up was just a single domain. I know Google has hinted toward SSL preference, and I know its a little early to know for certain how much that's going to help, but I just wanted to know what everybody thought? It expired yesterday, so I have to do something. And we lost our previous credentials so I can't just renew the old one. Thanks!
Intermediate & Advanced SEO | | HashtagHustler0 -
E-Commerce Multilanguage - Better on Subdomains?
Hi, We have an e-commerce store in English and Spanish - same products. URLs differ like this: ENGLISH:
Intermediate & Advanced SEO | | bjs2010
www.mydomain.com/en/manufacturer-sku-productnameinenglish.html SPANISH:
www.mydomain.com/es/manufacturer-sku-productnameinspanish.html All content on pages is translated, e.g, H1, Titles, keywords, descriptions and site content itself is in the language displayed. Is there a risk of similar or near dupe content here in the eyes of the big G? Would it be worth implementing different languages on subdomains or completely different domains? thank you B0 -
Google: How to See URLs Blocked by Robots?
Google Webmaster Tools says we have 17K out of 34K URLs that are blocked by our Robots.txt file. How can I see the URLs that are being blocked? Here's our Robots.txt file. User-agent: * Disallow: /swish.cgi Disallow: /demo Disallow: /reviews/review.php/new/ Disallow: /cgi-audiobooksonline/sb/order.cgi Disallow: /cgi-audiobooksonline/sb/productsearch.cgi Disallow: /cgi-audiobooksonline/sb/billing.cgi Disallow: /cgi-audiobooksonline/sb/inv.cgi Disallow: /cgi-audiobooksonline/sb/new_options.cgi Disallow: /cgi-audiobooksonline/sb/registration.cgi Disallow: /cgi-audiobooksonline/sb/tellfriend.cgi Disallow: /*?gdftrk Sitemap: http://www.audiobooksonline.com/google-sitemap.xml
Intermediate & Advanced SEO | | lbohen0 -
Should Site Search results be blocked from search engines?
What are the advantages & disadvantages of letting Google crawl site search results? We currently have them blocked via robots.txt, so I'm not sure if we're missing out on potential traffic. Thanks!
Intermediate & Advanced SEO | | pbhatt0