Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
-
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing?
I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution.
Thanks in advance, Luke
-
Hi there,
Regarding the X-Robots tag. We have had a couple of sites that were disallowed in the robots.txt have their PDF, Doc etc files get indexed. I understand the reasoning for this. I would like to remove the disallow in the robots.txt and use the X-robots tag to noindex all pages as well as PDF, Doc files etc. This is for a ngnix configuation. Does anyone know what the written x-robots tag would look like in this case?
-
Test for what works for your site.
Use tools below
- https://www.deepcrawl.com/ (will give you one free full crawl)
- https://www.screamingfrog.co.uk/seo-spider/ (free up to 500 URLs)
- http://urlprofiler.com/ (14 days free try)
- https://www.deepcrawl.com/blog/best-practice/noindex-disallow-nofollow/
- https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#robots-txt
- https://www.deepcrawl.com/blog/best-practice/noindex-and-google/
So much info
https://www.deepcrawl.com/blog/tag/robots-txt/
Thomas
-
Hi Luke,
In order to exclude individual pages from search engine indices, the noindex meta tag
is actually superior to robots.txt.
But X-Robots-Tag header tag is the best but much hader to use.
Block all web crawlers from all content
User-agent: * Disallow: /
Using the
robots.txt
file, you can tell a spider where it cannot go on your site. You can not tell a search engine which URLs it cannot show in the search results. This means that not allowing a search engine to crawl an URL – called “blocking” it – does not mean that URL will not show up in the search results. If the search engine finds enough links to that URL, it will include it; it will just not know what’s on that page.If you want to reliably block a page from showing up in the search results, you need to use a meta robots
noindex
tag. That means the search engine has to be able to index that page and find thenoindex
tag, so the page should not be blocked byrobots.txt
a
robots.txt
file does. In a nutshell, what it does is tell search engines to not crawl a particular page, file or directory of your website.Using this, helps both you and search engines such as Google. By not providing access to certain, unimportant areas of your website, you can save on your crawl budget and reduce load on your server.
Please note that using the
robots.txt
file to hide your entire website for search engines is definitely not recommended.see big photo: http://i.imgur.com/MM7hM4g.png
_(…)_ _(…)_
The robots meta tag in the above example instructs all search engines not to show the page in search results. The value of the
name
attribute (robots
) specifies that the directive applies to all crawlers. To address a specific crawler, replace therobots
value of thename
attribute with the name of the crawler that you are addressing. Specific crawlers are also known as user-agents (a crawler uses its user-agent to request a page.) Google's standard web crawler has the user-agent name.Googlebot
To prevent only Googlebot from crawling your page, update the tag as follows:This tag now instructs Google (but no other search engines) not to show this page in its web search results. Both the and
name
the attributescontent
are non-case sensitive.Search engines may have different crawlers for different properties or purposes. See the complete list of Google's crawlers. For example, to show a page in Google's web search results, but not in Google News, use the following meta tag:
If you need to specify multiple crawlers individually, it's okay to use multiple robots meta tags:
If competing directives are encountered by our crawlers we will use the most restrictive directive we find.
irective. This basically means that if you want to really hide something from the search engines, and thus from people using search,
robots.txt
won’t suffice.Indexer directives
Indexer directives are directives that are set on a per page and/or per element basis. Up until July 2007, there were two directives: the microformat rel=”nofollow”, which means that that link should not pass authority / PageRank, and the Meta Robots tag.
With the Meta Robots tag, you can really prevent search engines from showing pages you want to keep out of the search results. The same result can be achieved with the X-Robots-Tag HTTP header. As described earlier, the X-Robots-Tag gives you more flexibility by also allowing you to control how specific file(types) are indexed.
Example uses of the X-Robots-Tag
Using the
X-Robots-Tag
HTTP headerThe
X-Robots-Tag
can be used as an element of the HTTP header response for a given URL. Any directive that can be used in an robots meta tag can also be specified as anX-Robots-Tag
. Here's an example of an HTTP response with anX-Robots-Tag
instructing crawlers not to index a page:HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT _(…)_ **X-Robots-Tag: noindex** _(…)_
Multiple
X-Robots-Tag
headers can be combined within the HTTP response, or you can specify a comma-separated list of directives. Here's an example of an HTTP header response which has anoarchive
X-Robots-Tag
combined with anunavailable_after
X-Robots-Tag
.HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT _(…)_ **X-Robots-Tag: noarchive X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST** _(…)_
The
X-Robots-Tag
may optionally specify a user-agent before the directives. For instance, the following set ofX-Robots-Tag
HTTP headers can be used to conditionally allow showing of a page in search results for different search engines:HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT _(…)_ **X-Robots-Tag: googlebot: nofollow X-Robots-Tag: otherbot: noindex, nofollow** _(…)_
Directives specified without a user-agent are valid for all crawlers. The section below demonstrates how to handle combined directives. Both the name and the specified values are not case sensitive.
- https://moz.com/learn/seo/robotstxt
- https://yoast.com/ultimate-guide-robots-txt/
- https://moz.com/blog/the-wonderful-world-of-seo-metatags
- https://yoast.com/x-robots-tag-play/
- https://www.searchenginejournal.com/x-robots-tag-simple-alternate-robots-txt-meta-tag/67138/
- https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
I hope this helps,
Tom
-
If you've recently added the "noindex" meta, get the page fetched in GWT. Google can't act if it doesn't see the tag.
-
Hi Luke,
It's a pretty common misconception that the robots.txt will prevent indexing. It's only purpose is actually to prevent crawling, anything disallowed in there is still up for indexing if it's linked to elsewhere. If you want something deindexed, your best bet is the robots meta tag, but make sure you allow crawling of the URLs to give search engine bots an opportunity to see the tag.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages excluded from Google's index due to "different canonicalization than user"
Hi MOZ community, A few weeks ago we noticed a complete collapse in traffic on some of our pages (7 out of around 150 blog posts in question). We were able to confirm that those pages disappeared for good from Google's index at the end of January '18, they were still findable via all other major search engines. Using Google's Search Console (previously Webmastertools) we found the unindexed URLs in the list of pages being excluded because "Google chose different canonical than user". Content-wise, the page that Google falsely determines as canonical instead has little to no similarity to the pages it thereby excludes from the index. False canonicalization About our setup: We are a SPA, delivering our pages pre-rendered, each with an (empty) rel=canonical tag in the HTTP header that's then dynamically filled with a self-referential link to the pages own URL via Javascript. This seemed and seems to work fine for 99% of our pages but happens to fail for one of our top performing ones (which is why the hassle 😉 ). What we tried so far: going through every step of this handy guide: https://moz.com/blog/panic-stations-how-to-handle-an-important-page-disappearing-from-google-case-study --> inconclusive (healthy pages, no penalties etc.) manually requesting re-indexation via Search Console --> immediately brought back some pages, others shortly re-appeared in the index then got kicked again for the aforementioned reasons checking other search engines --> pages are only gone from Google, can still be found via Bing, DuckDuckGo and other search engines Questions to you: How does the Googlebot operate with Javascript and does anybody know if their setup has changed in that respect around the end of January? Could you think of any other reason to cause the behavior described above? Eternally thankful for any help! ldWB9
Intermediate & Advanced SEO | | SvenRi1 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Null Alt Image Tags vs Missing Alt Image Tags
Hi, Would it be better for organic search to have a null alt image tag programatically added to thousands of images without alt image tags or just leave them as is. The option of adding tailored alt image tags to thousands of images is not possible. Is having sitewide alt image tags really important to organic search overall or what? Right now, probably 10% of the sites images have alt img tags. A huge number of those images are pages that aren Thanks!
Intermediate & Advanced SEO | | 945010 -
Client has moved to secured https webpages but non secured http pages are still being indexed in Google. Is this an issue
We are currently working with a client that relaunched their website two months ago to have hypertext transfer protocol secure pages (https) across their entire site architecture. The problem is that their non secure (http) pages are still accessible and being indexed in Google. Here are our concerns: 1. Are co-existing non secure and secure webpages (http and https) considered duplicate content?
Intermediate & Advanced SEO | | VanguardCommunications
2. If these pages are duplicate content should we use 301 redirects or rel canonicals?
3. If we go with rel canonicals, is it okay for a non secure page to have rel canonical to the secure version? Thanks for the advice.0 -
Big discrepancies between pages in Google's index and pages in sitemap
Hi, I'm noticing a huge difference in the number of pages in Googles index (using 'site:' search) versus the number of pages indexed by Google in Webmaster tools. (ie 20,600 in 'site:' search vs 5,100 submitted via the dynamic sitemap.) Anyone know possible causes for this and how i can fix? It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag? Any help appreciated, Karen
Intermediate & Advanced SEO | | Digirank0 -
Can Google index PDFs with flash?
Does anyone know if Google can index PDF with Flash embedded? I would assume that the regular flash recommendations are still valid, even when embedded in another document. I would assume there is a list of the filetype and version which Google can index with the search appliance, but was not able to find any. Does anyone have a link or a list?
Intermediate & Advanced SEO | | andreas.wpv0 -
Can URLs blocked with robots.txt hurt your site?
We have about 20 testing environments blocked by robots.txt, and these environments contain duplicates of our indexed content. These environments are all blocked by robots.txt, and appearing in google's index as blocked by robots.txt--can they still count against us or hurt us? I know the best practice to permanently remove these would be to use the noindex tag, but I'm wondering if we leave them they way they are if they can still hurt us.
Intermediate & Advanced SEO | | nicole.healthline0 -
Google indexing issue?
Hey Guys, After a lot of hard work, we finally fixed the problem on our site that didn't seem to show Meta Descriptions in Google, as well as "noindex, follow" on tags. Here's my question: In our source code, I am seeing both Meta descriptions on pages, and posts, as well as noindex, follow on tag pages, however, they are still showing the old results and tags are also still showing in Google search after about 36 hours. Is it just a matter of time now or is something else wrong?
Intermediate & Advanced SEO | | ttb0