Robots.txt for subdomain
-
Hi there Mozzers!
I have a subdomain with duplicate content and I'd like to remove these pages from the mighty Google index. The problem is: the website is build in Drupal and this subdomain does not have it's own robots.txt.
So I want to ask you how to disallow and noindex this subdomain. Is it possible to add this to the root robots.txt:
User-agent: *
Disallow: /subdomain.root.nl/User-agent: Googlebot
Noindex: /subdomain.root.nl/Thank you in advance!
Partouter
-
Robots.txt work only for subdomain where it placed.
You need to create separate robots.txt for each sub-domain, Drupal allow this.
it must be located in the root directory of your subdomain Ex: /public_html/subdomain/ and can be accessed at http://subdomain.root.nl/robots.txt.
Add the following lines in the robots.txt file:
User-agent: *
Disallow: /
As alternative way you can use Robots <META> tag on each page, or use redirect to directory root.nl/subdomain and disallow it in main robots.txt. Personally i don't recommend it. -
Not sure how your server is configured but mine is set up so that subdomain.mydomain.com is a subdirectory like this:
http://www.mydomain.com/subdomain/
in robots.txt you would simply need to put
User-agent: *
Disallow: /subdomain/Others may have a better way though.
HTH
Steve
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt error
Moz Crawler is not able to access the robots.txt due to server error. Please advice on how to tackle the server error.
Technical SEO | | Shanidel0 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
Odd scenario: subdomain not indexed nor cached, reason?
hi all hopefully somebody can help me with this issue 🙂 6 months ago a number of pages hosted at a domain level have been moved to a subdomain level with 301redirects + some others were created from scratch ( at a subdomain level too). what happens is that not only the new urls at the subdomain level are not indexed nor cached, but the old urls are still indexed in google, although by clicking on them they bring to the new urls via 301 redirect. question is why having a 301 redirects to the new urls, no issues with robot.txt, metarobots etc, the new urls are still de-indexed? i might remind you that a few (100 pages or so) have been created from scratch, but they are also not indexed. the only issue found across the page is the no-cache line of code set as follow: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache i am not familiar with cache control lines. Can this be an issue from a correct indexing? thanks in advance Dario
Technical SEO | | Mrlocicero0 -
Best use of robots.txt for "garbage" links from Joomla!
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received. One of my biggest gripes is the point of "Dublicate Page Content". Right now im having over 200 pages with dublicate page content. Now.. This is triggerede because Seomoz have snagged up auto generated links from my site. My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears. Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google. So i just want to get rid of them. Now to my question I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all. But, how do i do this? what should my syntax be? A lof of the links looks like this, but has different id numbers according to the product that is being send: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I guess i need a rule that grabs the following and makes google ignore links that contains this: view=send_friend
Technical SEO | | teleman0 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
Getting home page content at top of what robots see
When I click on the text-only cache of nlpca(dot)com on the home page http://webcache.googleusercontent.com/search?q=cache:UIJER7OJFzYJ:www.nlpca.com/&hl=en&gl=us&strip=1 our H1 and body content are at the very bottom. How do we get the h1 and content at the top of what the robots see? Thanks!
Technical SEO | | BobGW0 -
Robots.txt Syntax
Does the order of the robots.txt syntax matter in SEO? For example (are there potential problems with this format): User-agent: * Sitemap: Disallow: /form.htm Allow: / Disallow: /cgnet_directory
Technical SEO | | RodrigoStockebrand0 -
Subdomains
Hi, I have recently started working in-house for a company and one site development was started and completed just as I joined. A new area of the site has been developed, but the developers have developed this new section in php, which cannot be hosted on the windows server the site is running on (they tell me, is this correct?) They want to add the new section as a subdomain - http://newarea.example.co.uk/ whereas I would have preferred the section added as a new subfolder. I plan to ensure that future developments to not have this problem, but is the best solution to work with the subdomain (in this instance it may not be too bad as it is a niche area of the site), or can I redirect the pages hosted on the sub-domain to a subfolder, and is this recommended? Thanks for your time.
Technical SEO | | LSLPS0