Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Subdomain Removal in Robots.txt with Conditional Logic??
-
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live.
My specific situation is this:
I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content.
Should I:
a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense)
b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning...
Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
-
Here's how I dealt with a similar situation in the past.
Robots.txt on each of the dev subdomains and on the live domain. Dev subdomains robots.txt excluded the entire subdomain, and subdomains were verified in GWT and removed as needed.
Made live subdomain robots.txt read-only so it didn't get overwritten. Should have made dev subdomains robots.txt read-only as well, since they sometimes got refreshed with the live content (there was a UGC database that would occasionally get copied to a dev subdomain, and we'd have robots.txt get copied over too and dev subdomain indexed).
Set up a code monitor that checks the contents of all of the robots.txt daily and sends me an email if anything is changed.
Not perfect, but I was at least able to catch changes soon after they happened, and prevented a few changes.
-
you can't put logic in robots.txt and subdomains are seen as different sites, so you need to create separate robots.txt files for each subdomain and block them in their respective robots.txt files.
You'll need to also add the Google verification code and verify them, then in GWMT you can request to have the subdomain removed from Googles index, that's the fastest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Bulk URL Removal in Webmaster Tools
One of Wordpress sites was hacked (for about 10 hours), and Google picked up 4000+ urls in the index. The site is fixed, but I'm stuck with all those urls in the index. All the urls of of the form: walkerorthodontics.com/index.php?online-payday-cash-loan.htmloncewe The only bulk removal option I could find was to remove an entire folder, but I can't do that, as it would only leave the homepage and kill off everything else. For some crazy reason, the removal tools doesn't support wildcards, so that obvious solution is right out. So, how do it get rid of 4000 results? And no, waiting around for them to 404 out of the index isn't an option.
Technical SEO | | MichaelGregory0 -
Should we remove category paths for better SEO?
We're looking to build some serious content and capitalise on long-tail keyword traffic for our sub-category pages, example for targeted keyword "designer dining tables". Example of current link: www.website.com/designer-furniture/designer-dining-tables.html Would removing the category paths help? Example result - www.website.com/designer-dining-tables More user friendly URLs and better for SEO would you suggest? The only problem is, if we removed the paths would this have a hit on our traffic? Any advice would be much appreciated. We are using Magento platform.
Technical SEO | | Jseddon920 -
Adding multi-language sitemaps to robots.txt
I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt?
Technical SEO | | MickEdwards0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0