Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed
-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.
Andy
-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.
User-agent: *
Disallow: /hope this helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallowed "Search" results with robots.txt and Sessions dropped
Hi
Intermediate & Advanced SEO | | Frankie-BTDublin
I've started working on our website and I've found millions of "Search" URL's which I don't think should be getting crawled & indexed (e.g. .../search/?q=brown&prefn1=brand&prefv1=C.P. COMPANY|AERIN|NIKE|Vintage Playing Cards|BIALETTI|EMMA PAKE|QUILTS OF DENMARK|JOHN ATKINSON|STANCE|ISABEL MARANT ÉTOILE|AMIRI|CLOON KEEN|SAMSONITE|MCQ|DANSE LENTE|GAYNOR|EZCARAY|ARGOSY|BIANCA|CRAFTHOUSE|ETON). I tried to disallow them on the Robots.txt file, but our Sessions dropped about 10% and our Average Position on Search Console dropped 4-5 positions over 1 week. Looks like over 50 Million URL's have been blocked, and all of them look like all of them are like the example above and aren't getting any traffic to the site. I've allowed them again, and we're starting to recover. We've been fixing problems with getting the site crawled properly (Sitemaps weren't added correctly, products blocked from spiders on Categories pages, canonical pages being blocked from Crawlers in robots.txt) and I'm thinking Google were doing us a favour and using these pages to crawl the product pages as it was the best/only way of accessing them. Should I be blocking these "Search" URL's, or is there a better way about going about it??? I can't see any value from these pages except Google using them to crawl the site.0 -
Is there any problem if we migrate the entire site to HTTPS except for the blog ?
Hello guys,
Intermediate & Advanced SEO | | newrankbg
I have a question to those of you, who have migrated from HTTP to HTTPS. We are planning to migrate the site of our customer to Always SSL. In other words, we want to redirect all site pages to HTTPS, except for the blog. Currently, the whole site is using the HTTP protocol (except the checkout page).
After the change, our customer's site should look like this: https://www.domain.com
http://www.domain.com/blog/ The reasons we do not want to migrate the blog to HTTPS are as follows: The blog does not collect any sensitive user information, as opposed to the site. We all know that on-site algorithms like Panda are having sitewide effect. If the Panda doesn’t like part of the blog (if any thin or low quality content), we do not want this to reflect on the rankings of the entire website. Having in mind that for Google, HTTP and HTTPS are two different protocols, a possible blog penalty should not reflect the web site, which will use HTTPS. Point 2 is the reason I am writing here, as this is just a theory. I would like to hear more thoughts from the experts here. Also, I would like to know your opinion, regarding this mixed use of protocols – could this change lead to a negative effect for any of the properties and why? For me, there should be no negative effect at all. The only disadvantage is that we will have to monitor both metrics – the blog and the site separately in webmaster tools. Thank you all and looking forward for your comments.0 -
Subdomain SEO question (php script on domain + wordpress on subdomain)
Hi Moz fellows, I am doing my first website which is entirely .php scripted. But I would like to have a wordpress blog to create content and blog posts, while the .php side of the website is more for sales pages and user generated listings.The only way to do this is to install wordpress on a subdomain "blog.website.com" QUESTION: If all my keywords targeted content is on the subdomain's Wordpress blog, but all my guest blogging efforts link to my main website, which one will rank? The subdomain or the domain? I need the domain to rank well as it is a Fiverr-like script, so if tons of people land on my "blog.website.com" subdomain, they will not convert into users... Let me know if you have experience with such a scenario, and thank you all in advance for your help! -Marc
Intermediate & Advanced SEO | | marcandre0 -
Entire site code copied - potential SEO issues?
Hi folks, We have noticed that our site has been directly duplicated by another site. They have copied the entire code, including the JS, CSS and most of the HTML and have simply switched their own text and images onto the template. (We discovered it because they even copied over our analytics tracking and were appearing in our reports - duh!) Does anyone know if there are potential SEO issues in copying the code like that, or do duplicate content issues only apply to indexable HTML content? Thanks! Matthew (I didn't want to out them by sharing their URL because it could have been an external contractor that built the site and they probably had no idea.)
Intermediate & Advanced SEO | | MattBarker0 -
If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?
Our not-so-lovely CMS loves to render pages regardless of the URL structure, just as long as the page name itself is correct. For example, it will render the following as the same page: example.com/123.html example.com/dumb/123.html example.com/really/dumb/duplicative/URL/123.html To help combat this, we are creating mod rewrites with friendly urls, so all of the above would simply render as example.com/123 I understand robots.txt respects the wildcard (*), so I was considering adding this to our robots.txt: Disallow: */123.html If I move forward, will this block all of the potential permutations of the directories preceding 123.html yet not block our friendly example.com/123? Oh, and yes, we do use the canonical tag religiously - we're just mucking with the robots.txt as an added safety net.
Intermediate & Advanced SEO | | mrwestern0 -
Use of subdomains, subdirectories or both?
Hello, i would like your advice on a dilemma i am facing. I am working a new project that is going to release soon, thats a network of users with personal profiles seperated in categories for example lets say the categories are colors. So let say i am a member and i belong in red color categorie and i got a page where i update my personal information/cv/resume as well as a personal blog thats on that page. So the main site is giving the option to user to search for members by the criteria of color. My first idea is that all users should own a subdomain (and this is how its developed so far) thats easy to use and since the domain name is really small (just 3 letters) i believe subdomain worth since personal site will be easy to remember. My dilemma is should all users own a subdomain, a subdirectory or both and if both witch one should be the canonical? Since it said that search engines treat subdomains as different stand-alone sites, whats best for the main site? to show multiple search results with profiles in subdomains or subdirectories? What if i use both? meaning in search results i use search directory url for each profile while same time each profile owns a subdomains as well? and if so which one should be the canonical? Thanks in advance, C
Intermediate & Advanced SEO | | HaCos0 -
Should subdomains be avoided for brand new websites?
When creating a brand new website, will setting it up as a subdomain provide ranking benefits? I understand that if it's an existing domain, it's better to use a subfolder because a subdomain is treated as a different domain. But is there any reason not to start a website with the keyword in the subdomain? For example: keyword.domain.com The SERP's are dominated by websites which contain some variation of the head term, but the disadvantage of doing a similar this is your website looks very similar. Thanks!
Intermediate & Advanced SEO | | JonDavies540 -
Migrating online store to subdomain using shopify and effects on seo and energy down the road for seo
I'm looking for some clarity... Looking at using Shopify for an existing online store that we have to migrate. Setting up the store with shopify means we will be using a subdomain such as shop.mywebsite.com instead of mywebsite.com/shop. The following are points to consider when responding The client currently has an online store, however it's a proprietary shopping store and CMS that has since gone defunct and they need to migrate to an alternative in order to survive online against new CMS systems that allow the site and its content to be better optimized. There is a lot of existing SEO done on the current site that we don't want to loose PR on. There is roughly 2000 products Client has a fixed budget, dealing with checkout issues, custom work and various other "bugs" seems to be easier controlled with Shopify...thus budget can be used more on content/strategy and migration We want to run the main site in Wordpress and are wanting to use Shopify since it supports a gateway, has great features and seems like it would allow us to get more bang for the buck and can focus more on the main site and content strategy and drive traffic to the subdomain store if needed Or main concern is the effort of migrating 2000+ products to shopify and the traffic and PR it gives the current site will have a negative effect on the main domain itself. Should we really be considering this path? The domain is diveidc.com One main benefit to the subdomain is the ability to clearly segment products from the service portion of the site in the analytics and focus 2 clear strategies and track it in a very defined manner. We're really on the fence with this...any thoughts are welcome.
Intermediate & Advanced SEO | | MAGNUMCreative0