Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
Pages with duplicate meta descriptions
We have around 17 pages have underscores in the URL. From the 17 pages, we have changed 3 pages URL for example if the url is test_sample_demo.html, we have changed as test-sample-demo.html After the updates, we have made redirect as follows Redirect 301 test_sample_demo.html test-sample-demo.html Presently google webmaster tool shows as "Pages with duplicate meta descriptions" & "Pages with duplicate title tags" for changed pages How to fix this. Please help us
Technical SEO | | Intellect0 -
JSON-LD meta data: Do you have any rules/recommendations for using BlogPosting vs Article?
Dear Moz Community. I'm looking at moving from in-line Microdata in the HTML to JSON-LD on the web pages that I manage. Seems a far simpler solution having all the meta data in one place - especially for trouble shooting! With this in mind I've started to change the page templates on my personal site before I tackle the ones for my eCommerce site. I've made a start, and I'm still working on the templates producing some default values (like if a page doesn't have an associated image) but have been wondering if any of you have any rules/recommendations for using BlogPosting vs Article? I'd call this type of page an Article:
Technical SEO | | andystorey
https://cycling-jersey-collection.com/browse-collection/selle-italia-chinol-seb-bennotto-1982-team-jersey Whereas this page is from the /blog so that should probably be a BlogPosting:
https://cycling-jersey-collection.com/blog/2017-worldtour-team-jerseys I've used the following resources but it would be great to get a discussion on here.
https://yoast.com/structured-data-schema-ultimate-guide/
https://developers.google.com/search/docs/data-types/data-type-selector
https://search.google.com/structured-data/testing-tool/u/0/ I'm keen to get this 100% right as once this is done I'm going to drive through some further changes to get some progress on things like this: https://moz.com/blog/ranking-zero-seo-for-answers
https://moz.com/blog/what-we-learned-analyzing-featured-snippets Kind Regards andy moz-screenshot.jpg1 -
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Robots.txt question
What is this robots.txt telling the search engines? User-agent: * Disallow: /stats/
Technical SEO | | DenverKelly0 -
Google showing former meta tags in search results inspite of new tags being crawled by it
I had changed the meta tags for a site www.aztexsodablast.com.au about a month back and Google has also crawled those new tags but in search results when I search for the term 'Aztex Sodablast' it is continuing to show the old tags while on the site, the new tags are being displayed. What may be the issue and how could I correct the problem?
Technical SEO | | pulseseo0 -
How do I add meta descriptions to Archives in Wordpress?
My most recent crawl returned a number of 'missing meta description' errors, and when I checked individual URLs, it turned out they were Wordpress Archived pages - for individual months and days (e.g. http:// .../2011/01). What's the best way to go about adding descriptions to these pages, if at all? Or should I have these pages not be indexed? I am using the All in One SEO plugin, so maybe there is an easy fix through this plugin, or it may be the cause of these errors? Any help is appreciated, thanks in advance! **EDIT After looking it up further, I have decided to use noindex for Archives, which should solve my problem right? Or is there a benefit to having those archived pages?
Technical SEO | | NetPicks0 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230