Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site's meta description is not being shown in Google Search results. Instead our privacy policy is getting indexed.
We re-launched our new site and put in the re-directs. Our site is https://www.fico.com/en. When I search for "fico" in Google. I see the privacy policy getting indexed as meta descriptions instead of our actual meta description. I have edited the meta description, requested Google to re-index our site. Not sure what to do next? Thanks for your advise.
Technical SEO | | gosheen0 -
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Google is Still Blocking Pages Unblocked 1 Month ago in Robots
I manage a large site over 200K indexed pages. We recently added a new vertical to the site that was 20K pages. We initially blocked the pages using Robots.txt while we were developing/testing. We unblocked the pages 1 month ago. The pages are still not indexed at this point. 1 page will show up in the index with an omitted results link. Upon clicking the link you can see the remaining un-indexed pages. Looking for some suggestions. Thanks.
Technical SEO | | Tyler1230 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Does bing accept meta name="fragment" for AJAX crawling?
I have a case in which the whole site is AJAX, the method to appease to crawlers used is <meta< span="">name="fragment" content="!"> Which is the new HTML5 PushState that Bing said it supports (At least I think it is that) This approach works for Google, but Bing isn't showing anything. Does anyone know if Bing supports this and we have to alter something or if not is there a known work around? The only other logic we have is to recognize the bing user agent and redirect to the rendered page, but we were worried that could cause some kind of cloaking penalty</meta<>
Technical SEO | | MarloSchneider0 -
How to block google robots from a subdomain
I have a subdomain that lets me preview the changes I put on my site. The live site URL is www.site.com, working preview version is www.site.edit.com The contents on both are almost identical I want to block the preview version (www.site.edit.com) from Google Robots, so that they don't penalize me for duplicated content. Is it the right way to do it: User-Agent: * Disallow: .edit.com/*
Technical SEO | | Alexey_mindvalley0 -
Why crawl error "title missing or empty" when there is already "title and meta desciption" in place?
I've been getting 73 "title missing or empty" warnings from SEOMOZ crawl diagnostic. This is weird as I've installed yoast wordpress seo plugin and all posts do have title and meta description. But why the results here.. can anyone explain what's happening? Thanks!! Here are some of the links that are listed with "title missing, empty". Almost all our blog posts were listed there. http://www.gan4hire.com/blog/2011/are-you-here-for-good/ http://www.gan4hire.com/blog/2011/are-you-socially-awkward/ MaeM3.png TLcD8.png
Technical SEO | | JasonDGreat0 -
Using robots.txt to deal with duplicate content
I have 2 sites with duplicate content issues. One is a wordpress blog. The other is a store (Pinnacle Cart). I cannot edit the canonical tag on either site. In this case, should I use robots.txt to eliminate the duplicate content?
Technical SEO | | bhsiao0