Do I need robots.txt and meta robots?
-
If I can manage to tell crawlers what I do and don't want them to crawl for my whole site via my robots.txt file, do I still need meta robots instructions?
-
Older information, but mostly still relevant:
-
Although robots.txt and meta robots appear to do similar things, they both serve different functions.
Block with Robots.txt - This tells the engines to not crawl the given URL but tells them that they may keep the page in the index and display it in in results.
Block with Meta NoIndex - This tells engines they can visit but they are not allowed to display the URL in results. (this is a suggestion only - Google may still choose to show the URL)
Source: http://www.seomoz.org/learn-seo/robotstxt
The disadvantage of robots.txt is that it blocks Google from crawling the page, meaning no link juice can flow through the page, and if Google discovers the URL through other means (external links) it may show the URL anyway in search results, usually without a meta description.
The advantage of robots.txt is it can improve crawl efficiency - useful if you find Google crawling a bunch of unnecessary pages and eating up your crawl allowance.
Most of the time, I only use robots.txt to solve problems that I can't solve at the page level. I usually prefer to keep pages out of the index using a meta NOINDEX, FOLLOW tag.
-
If you want the stub listing removed as well, this is quite straight forward once you have it blocked in Robots. Instructions here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663419
Just checking though: If the content you are trying to remove is something private that should be hidden (as opposed to just low value stuff that you don't want cluttering the SERPS) then this isn't the right way to go about it. If that is the case reply back.
-
Hello Mat,
As far as I know if I blocked a url using robots.txt.For that page I will get only url in serps but i want to remove url from serps also.How to do that?
-
In short, no. You only need to include the instruction in one or the other. Most people find that the robots.txt file is the preferred solution because it will only take a few lines to specify which parts of a well structured site should and should not be crawled.
-
What do you mean by meta robots instructions? Are you referring to the meta tags that go on each individual page? In that case, no, you don't necessarily need them. Robots assume a page should be crawled unless told otherwise. I'd still do it for pages that you don't want indexed and/or followed because a lot of times, robots, especially Google, seem to ignore these directives.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
Should I put meta descriptions on pages that are not indexed?
I have multiple pages that I do not want to be indexed (and they are currently not indexed, so that's great). They don't have meta descriptions on them and I'm wondering if it's worth my time to go in and insert them, since they should hypothetically never be shown. Does anyone have any experience with this? Thanks! The reason this is a question is because one member of our team was linking to this page through Facebook to send people to it and noticed random text on the page being pulled in as the description.
Technical SEO | | Viewpoints0 -
Have I constructed my robots.txt file correctly for sitemap autodiscovery?
Hi, Here is my sitemap: User-agent: * Sitemap: http://www.bedsite.co.uk/sitemaps/sitemap.xml Directories Disallow: /sendfriend/
Technical SEO | | Bedsite
Disallow: /catalog/product_compare/
Disallow: /media/catalog/product/cache/
Disallow: /checkout/
Disallow: /categories/
Disallow: /blog/index.php/
Disallow: /catalogsearch/result/index/
Disallow: /links.html I'm using Magento and want to make sure I have constructed my robots.txt file correctly with the sitemap autodiscovery? thanks,0 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
Correct Way to Write Meta
OK so this is a really, really basic question. However, I'm seeing some meta written differently to normal and I'm wondering if a) this is correct and b) whether there is any benefit. Normally it's like this: However, I am seeing it written like this is some places: So, the content= and name= are swapped around. I assume the people that did this were thinking that bringing the content forward would mean that Google reads keywords first. Just wondering if anybody knows whether this is good practice or not? Just spiked my interest so apologies for the basic nature of the question!
Technical SEO | | RiceMedia0 -
Robots.txt
Hello Everyone, The problem I'm having is not knowing where to have the robots.txt file on our server. We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes... Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file? Thanks for your insight on this matter.
Technical SEO | | BailHotline0 -
What research needs to be done prior to purchasing an old domain?
I'm planning on purchasing an old domain. I'd like to get an idea whether or not any "shadyness" was going on with the domain over the years. It's still indexed in google. I "whois'd" it, ran it through open site explorer, and did a "link:" search. Anything else you would recommend for researching this domain?
Technical SEO | | Gyi0