Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed
-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.
Andy
-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.
User-agent: *
Disallow: /hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Short description about our search results drop + forum moving to subdomain question.
Hello, here is our story. Our niche is mental health (psychology, psychotherapy e.t.c). Our portal has thousand of genuine articles, news section about mental health, researches, job findings for specialists, a specialized bookstore only with psychology books, the best forum in country, we thousands of active members and selfhelp topics etc. In our country (non english), our portal has been established in 2003. Since then, for more than 15 years, we were no 1 in our country, meaning that we had the best brand name, hundreds of external authors writing unique content for our portal and hundreds of no1 keywords in google search results. Actually, we had according to webmaster tools, more than 1.000 keywords, in 1 and 2 position. (we were ranking no1 in all the best keywords). Before 2 years, we purchased the best domain in our niche. I ll use the below example (of course, domains are not the real ones):
Intermediate & Advanced SEO | | dodoni
We had: e-pizza.com and now we have: pizza.com
We did the appropriate redirects but from day one, we had around 20-30% drop in search engines. After 6 months -which is something that google officialy mentions, we lost all "credits from the old domain.. .and at that point, we had another 20-30% drop in search results. Further more, in any google core update, we were keep dropping. Especially in last May (coronovirus update), we had another huge drop. We do follow seo guides, we have a dedicated server, good load speed, well structured data, amp, a great presence in social media, with more than 130.000 followers, etc. According to our investigation, we came to one only conclusion: that our forum, kills our seo (of course, noone in our team can guarantee that this is the actual reason of the uge drop in may-in coronovirus google core update). We believe that the forum kills our seo, because it produces low quality posts by members. For example, psychopharmacology in a very active sections and we believe, google is very "sensitive" in these kind of posts and information. So here is the question: although the forum is very very active, with thousands of new topics and posts every month, we are thinking of moving it to a subdomain, from the subfolder that now is.
This will help our domain authority to increase from 38 that is stuck 2 years now, to larger scales. We believe that althougth this forum gave a great boost to the portal, in the past 10-15 years, it somehow makes a negative impact now. If I could give more spesific details, I d say this: in all seo tools we run, the best kewwords bringing visitors to us, arent anymore, psychology and psychotherapy and mental health and this kind of top-keywords, but are mostly the ones from the forum, like: I want to proceed with a suicide, I m taking efexor or xanax and they have side effects, why i gain wieght with the antidepressants I get etc. 1. Moving our forum to subdomain, will be some kind of pain, since it is a large community, with thousands of backlinks that we somehow must handle in a proper way, also with a mobile application, things that will have to change and probably have some kind of negative impact. Would that be according to your knowledge a correct move and our E-A-T will benefit for google, or since google will know that the subdomain is still part of the same website/portal, it will handle it somehow, the same way as it does now? I have read hundreds of articles about forum in subdomains or in subfolders, but none of them covers a case stydy like ours, since most articles are talking about new forums and what is the best way to handle them and where is the best place to create them (in subfolder of subdomain) when from scratch. Looking forward to your answers.0 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Keyword phrase for entire site
Hey everyone! I'm fairly new to SEO but I have a large number of sites I'm needing to SEO. I'm a tad confused as to how many keyword phrases I should use throughout my site. For example, my site is www.uluru.travel. I want to rank highlight for the phrase 'uluru tours' throughout the site, as many of my pages list uluru tours and people searching for this phrase are my type of customers. As you can see I've tried to do some basic on page SEO for that phrase by including it in page title, headings etc. But the entire site doesn't seem to rank very well. Would you guys suggest trying to target 'uluru tours' phrase throughout the entire site of just focus a couple of pages on this term? Any advice is greatly appreciated guys! Cheers
Intermediate & Advanced SEO | | Mysites0 -
How to handle a blog subdomain on the main sitemap and robots file?
Hi, I have some confusion about how our blog subdomain is handled in our sitemap. We have our main website, example.com, and our blog, blog.example.com. Should we list the blog subdomain URL in our main sitemap? In other words, is listing a subdomain allowed in the root sitemap? What does the final structure look like in terms of the sitemap and robots file? Specifically: **example.com/sitemap.xml ** would I include a link to our blog subdomain (blog.example.com)? example.com/robots.xml would I include a link to BOTH our main sitemap and blog sitemap? blog.example.com/sitemap.xml would I include a link to our main website URL (even though it's not a subdomain)? blog.example.com/robots.xml does a subdomain need its own robots file? I'm a technical SEO and understand the mechanics of much of on-page SEO.... but for some reason I never found an answer to this specific question and I am wondering how the pros do it. I appreciate your help with this.
Intermediate & Advanced SEO | | seo.owl0 -
Robots.txt assistance
I want to block all the inner archive news pages of my website in robots.txt - we don't have R&D capacity to set up rel=next/prev or create a central page that all inner pages would have a canonical back to, so this is the solution. The first page I want indexed reads:
Intermediate & Advanced SEO | | theLotter
http://www.xxxx.news/?p=1 all subsequent pages that I want blocked because they don't contain any new content read:
http://www.xxxx.news/?p=2
http://www.xxxx.news/?p=3
etc.... There are currently 245 inner archived pages and I would like to set it up so that future pages will automatically be blocked since we are always writing new news pieces. Any advice about what code I should use for this? Thanks!0 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
Entire site code copied - potential SEO issues?
Hi folks, We have noticed that our site has been directly duplicated by another site. They have copied the entire code, including the JS, CSS and most of the HTML and have simply switched their own text and images onto the template. (We discovered it because they even copied over our analytics tracking and were appearing in our reports - duh!) Does anyone know if there are potential SEO issues in copying the code like that, or do duplicate content issues only apply to indexable HTML content? Thanks! Matthew (I didn't want to out them by sharing their URL because it could have been an external contractor that built the site and they probably had no idea.)
Intermediate & Advanced SEO | | MattBarker0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0