Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using the £ sign in meta title
Is it a bad idea to include a £sign in my meta title? It currently has a price incentive in it. Does Google not like this from organic traffic titles/ meta descriptions? Thanks,
Technical SEO | | EdLongley0 -
Duplicate content: using the robots meta tag in conjunction with the canonical tag?
We have a WordPress instance on an Apache subdomain (let's say it's blog.website.com) alongside our main website, which is built in Angular. The tech team is using Akamai to do URL rewrites so that the blog posts appear under the main domain (website.com/more-keywords/here). However, due to the way they configured the WordPress install, they can't do a wildcard redirect under htaccess to force all the subdomain URLs to appear as subdirectories, so as you might have guessed, we're dealing with duplicate content issues. They could in theory do manual 301s for each blog post, but that's laborious and a real hassle given our IT structure (we're a financial services firm, so lots of bureaucracy and regulation). In addition, due to internal limitations (they seem mostly political in nature), a robots.txt file is out of the question. I'm thinking the next best alternative is the combined use of the robots meta tag (no index, follow) alongside the canonical tag to try to point the bot to the subdirectory URLs. I don't think this would be unethical use of either feature, but I'm trying to figure out if the two would conflict in some way? Or maybe there's a better approach with which we're unfamiliar or that we haven't considered?
Technical SEO | | prasadpathapati0 -
Do I have a robots.txt problem?
I have the little yellow exclamation point under my robots.txt fetch as you can see here- http://imgur.com/wuWdtvO This version shows no errors or warnings- http://imgur.com/uqbmbug Under the tester I can currently see the latest version. This site hasn't changed URLs recently, and we haven't made any changes to the robots.txt file for two years. This problem just started in the last month. Should I worry?
Technical SEO | | EcommerceSite0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Meta Description
Working with a business that is having some real issues. They had some client information that was showing up in the meta description. Personal phone numbers for example. Our developers removed all the information from the pages in question two days ago, but we are still seeing the info in the meta description. Any idea how long this will take to be recrawled and fixed? Anything I can do to get recrawled sooner? Also, this is only happening in Bing/Yahoo and not in Google. Thanks for any help you can provide!
Technical SEO | | PGD20110 -
Meta keywords
I heard that Google doesn't use 'meta keywords' for ranking. Is that true and does it harm to use the meta tag? And how do the other big SEs use the tag?
Technical SEO | | kortingsplanet0 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230