What is the sense of robots.txt?
-
Using robots.txt to prevent search engine from indexing the page is not a good idea. so what is the sense of robots.txt? just for attracting robots to crawl sitemap?
-
While your robots.txt file is not the best means to control search engines, it does have a purpose. To respond to your questions:
-
the file does not "attract" any robots, but robots who do visit can learn a bit about your site and understand what content you don't wish to be crawled
-
you can block parts of your site that you feel have no value for indexing such as Keri mentioned your "print" version of pages, or overlays pages, or login pages, etc.
The idea is that you own the website, and you can have a measure of control over it. You can disallow specific crawlers, etc. although it's up to each crawler whether they actually respect your wishes.
More details can be read at: http://www.robotstxt.org/
-
-
There are often times pages you don't want indexed, and that's what robots.txt is there for. These are just some things you may not want indexed:
- Premium content for subscription-only members
- Your admin directory
- Printable versions of pages
- Development servers
You keep things you don't want out of the index, and you also don't waste the crawl budgets of the search engines on stuff that's not what you want in the engines in the first place.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No descripton on Google/Yahoo/Bing, updated robots.txt - what is the turnaround time or next step for visible results?
Hello, New to the MOZ community and thrilled to be learning alongside all of you! One of our clients' sites is currently showing a 'blocked' meta description due to an old robots.txt file (eg: A description for this result is not available because of this site's robots.txt) We have updated the site's robots.txt to allow all bots. The meta tag has also been updated in WordPress (via the SEO Yoast plugin) See image here of Google listing and site URL: http://imgur.com/46wajJw I have also ensured that the most recent robots.txt has been submitted via Google Webmaster Tools. When can we expect these results to update? Is there a step I may have overlooked? Thank you,
Technical SEO | | adamhdrb
Adam 46wajJw0 -
Robots.txt
Google Webmaster Tools say our website's have low-quality pages, so we have created a robots.txt file and listed all URL’s that we want to remove from Google index. Is this enough for the solve problem?
Technical SEO | | iskq0 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0 -
Help needed with robots.txt regarding wordpress!
Here is my robots.txt from google webmaster tools. These are the pages that are being blocked and I am not sure which of these to get rid of in order to unblock blog posts from being searched. http://ensoplastics.com/theblog/?cat=743 http://ensoplastics.com/theblog/?p=240 These category pages and blog posts are blocked so do I delete the /? ...I am new to SEO and web development so I am not sure why the developer of this robots.txt file would block pages and posts in wordpress. It seems to me like that is the reason why someone has a blog so it can be searched and get more exposure for SEO purposes. IS there a reason I should block any pages contained in wodrpress? Sitemap: http://www.ensobottles.com/blog/sitemap.xml User-agent: Googlebot Disallow: /*/trackback Disallow: /*/feed Disallow: /*/comments Disallow: /? Disallow: /*? Disallow: /page/
Technical SEO | | ENSO
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/themes/ Disallow: /trackback Disallow: /commentsDisallow: /feed0 -
Mobile site: robots.txt best practices
If there are canonical tags pointing to the web version of each mobile page, what should a robots.txt file for a mobile site have?
Technical SEO | | bonnierSEO0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0 -
When does it make sense to use no-follow on your own domain?
Hey guys, I'm not too sure if I'm over-thinking this, but I've seen no-follow being used with SEOmoz and I'm looking to implement this myself. Most of my links point to my root domain (yes I'm working on building links to deep pages) so would it make sense to 'limit' or 'no-follow' links on my root domain so that only the most important pages are being passed link juice? Thanks
Technical SEO | | reegs0