Subdomain Removal in Robots.txt with Conditional Logic??
-
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live.
My specific situation is this:
I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content.
Should I:
a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense)
b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning...
Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
-
Here's how I dealt with a similar situation in the past.
Robots.txt on each of the dev subdomains and on the live domain. Dev subdomains robots.txt excluded the entire subdomain, and subdomains were verified in GWT and removed as needed.
Made live subdomain robots.txt read-only so it didn't get overwritten. Should have made dev subdomains robots.txt read-only as well, since they sometimes got refreshed with the live content (there was a UGC database that would occasionally get copied to a dev subdomain, and we'd have robots.txt get copied over too and dev subdomain indexed).
Set up a code monitor that checks the contents of all of the robots.txt daily and sends me an email if anything is changed.
Not perfect, but I was at least able to catch changes soon after they happened, and prevented a few changes.
-
you can't put logic in robots.txt and subdomains are seen as different sites, so you need to create separate robots.txt files for each subdomain and block them in their respective robots.txt files.
You'll need to also add the Google verification code and verify them, then in GWMT you can request to have the subdomain removed from Googles index, that's the fastest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do about this subdomain for SEO?
This is a bit of an unusual structure and I'm having difficulty explaining the question so pardon my being a 'noob', haha. The website I'm working on has some content under Forums that is hosted on another domain. The main website is https://yournorthside.org.au/ and if you select under the main Nav > Forums > Lived Experience it will take you to https://yournorthside.saneforums.org/t5/Lived-Experience-Forum/ct-p/lived-experience-forum. So it's as if it's a subdomain. (notice even the appearance of the main menu changes, weird) Apparently, saneforums.org has a requirement for that content to be on that subdomain. So therefore it's not part of my sitemap and now crawled or indexed. My question is is this structure okay? What are the implications for SEO? Should I be looking to implement some type of no follow link or something? Or is it actually beneficial in terms of all their content gives us 'link juice'? Can you link me to any resources / articles that give further insight?
Technical SEO | | kelseyc0 -
How to force Wordpress to remove trailing slashes?
I've searched around quite a bit for a solution here, but I can't find anything. I apologize if this is too technical for the forum. I have a Wordpress site hosted on Nginx by WP Engine. Currently it resolves requests to URLs either with or without a trailing slash. So, both of these URLs are functional: <code>mysite.com/single-post</code> and <code>mysite.com/single-post/</code> I would like to remove the trailing slash from all posts, forcing mysite.com/single-post/ to redirect to mysite.com/single-post. I created a redirect rule on the server: ^/(.*)/$ -> /$1 and this worked well for end-users, but rendered the admin panel inaccessible. Somewhere, Wordpress is adding a trailing slash back on to the URL mysite.com/wp-admin, resulting in a redirect loop. I can't see anything obvious in .htaccess. Where is this rule adding a trailing slash to 'wp-admin' established? Thanks very much
Technical SEO | | james-tb0 -
Odd scenario: subdomain not indexed nor cached, reason?
hi all hopefully somebody can help me with this issue 🙂 6 months ago a number of pages hosted at a domain level have been moved to a subdomain level with 301redirects + some others were created from scratch ( at a subdomain level too). what happens is that not only the new urls at the subdomain level are not indexed nor cached, but the old urls are still indexed in google, although by clicking on them they bring to the new urls via 301 redirect. question is why having a 301 redirects to the new urls, no issues with robot.txt, metarobots etc, the new urls are still de-indexed? i might remind you that a few (100 pages or so) have been created from scratch, but they are also not indexed. the only issue found across the page is the no-cache line of code set as follow: Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache i am not familiar with cache control lines. Can this be an issue from a correct indexing? thanks in advance Dario
Technical SEO | | Mrlocicero0 -
Is my robots.txt file working?
Greetings from medieval York UK 🙂 Everytime to you enter my name & Liz this page is returned in Google:
Technical SEO | | Nightwing
http://www.davidclick.com/web_page/al_liz.htm But i have the following robots txt file which has been in place a few weeks User-agent: * Disallow: /york_wedding_photographer_advice_pre_wedding_photoshoot.htm Disallow: /york_wedding_photographer_advice.htm Disallow: /york_wedding_photographer_advice_copyright_free_wedding_photography.htm Disallow: /web_page/prices.htm Disallow: /web_page/about_me.htm Disallow: /web_page/thumbnails4.htm Disallow: /web_page/thumbnails.html Disallow: /web_page/al_liz.htm Disallow: /web_page/york_wedding_photographer_advice.htm Allow: / So my question is please... "Why is this page appearing in the SERPS when its blocked in the robots txt file e.g.: Disallow: /web_page/al_liz.htm" ANy insights welcome 🙂0 -
Removing irrelevant items from Google News?
A client wants to know if it's possible to get Google to remove stories from Google News feeds if those stories have nothing to do with the client? Any advice would be greatly appreciated. Thank you.
Technical SEO | | JamesAMartin0 -
Remove unwanted map in SERP
My company is based in Brighton. We run courses in London. If you search 'london business writing' in Google UK, you get this: http://i39.tinypic.com/35me3qs.jpg Lolwut. Google is placing a link for a map to our Brighton offices beneath the second result. For a London-related keyword that links to a page for our London courses that contains an address for our London venue. We are registered on Google maps as being based in Brighton; we also have a map of our Brighton office on our contact page. But obviously, this is not relevant to this search. How do I get rid of this map for this keyword?
Technical SEO | | JacobFunnell0 -
Search engines have been blocked by robots.txt., how do I find and fix it?
My client site royaloakshomesfl.com is coming up in my dashboard as having Search engines have been blocked by robots.txt, only I have no idea where to find it and fix the problem. Please help! I do have access to webmaster tools and this site is a WP site, if that helps.
Technical SEO | | LeslieVS0 -
Switching subdomains
A few years ago our company decided to merge all its websites (magazine brands) as sub-domains under one new root domain. So the current situation is like this: brand1.rootdomain.com
Technical SEO | | WDN
brand2.rootdomain.com
brand3.rootdomain.com
... For the moment the rootdomain has a domain authority of 66. In a few weeks we would like to switch that rootdomain to the strongest (highest trust, pagerank,...) brand. So we get this: www.brand1.com
brand2.brand1.com
brand3.brand1.com Before we make the switch i'll have to make a pro and con list. So I hope I can get some advice for you guys if this is a good idea or not.0