What Is A Robots.txt File? A Guide to Best Practices and Syntax
Updated by Chima Mmeje — March 18, 2025.
What Is a Robots.txt File?
A Robots.txt file is a plain text file placed in the root directory of a website to communicate with web crawlers or bots. For example, yourwebsite.com/robots.txt. It provides instructions, often referred to as rules, on which parts of the website bots can access.
This file is a foundational element of the robots exclusion protocol, a standard that helps manage bot activity across websites. By specifying meta directives like “allow” and “disallow,” a Robots.txt file gives website owners control over how their directories and pages are crawled. While robots.txt files manage bot activity for the entire site, the meta robots tag applies to individual web pages.
Importance of robots.txt for SEO and website management
A well-configured Robots.txt file offers several benefits for SEO and website efficiency:
- Manage crawling priorities: Direct bots to focus on valuable content while skipping duplicate or irrelevant pages.
- Optimize sitemap usage: Guide crawlers to the sitemap to ensure efficient indexing of key directories.
- Conserve server resources: Reduce unnecessary bot activity, preventing excessive load on HTTP requests.
- Protect sensitive files: Prevent crawlers from accessing or indexing confidential or non-public TXT files.
- Enhance SEO strategy: Support better crawl budget allocation and improve website visibility by focusing on the right areas.
Identify and fix robots.txt warnings
with Moz Pro Site Crawl

Examples of robots.txt directives:
Here are a few examples of robots.txt in action for a www.example.com site:
By using specific directives, you can control which parts of your site appear in Google search results, optimizing your content for better visibility. Meta robots directives can also be used to control how search engines crawl and index specific pages, complementing the rules set in the robots.txt file.
Basic robots.txt format:
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
Here it is in practice:
User-agent: Googlebot
Disallow: /example-subfolder/
Together, these two lines are considered a complete robots.txt file — though one robots file can contain multiple lines of user agents and directives (i.e., disallows, allows, crawl-delays, etc.).
Blocking all web crawlers from all content
User-agent: *
Disallow: /
Using this syntax in a robots.txt file would tell all web crawlers not to crawl any pages on www.example.com, including the homepage.
Allowing all web crawlers access to all content
User-agent: *
Disallow:
Using this syntax in a robots.txt file tells web crawlers to crawl all pages on www.example.com, including the home page.
Blocking a specific user agent from a specific folder
User-agent: Googlebot
Disallow: /example-subfolder/
This syntax tells only Google's crawler (user-agent name Googlebot) not to crawl any pages that contain the URL string www.example.com/example-subfolder/
Blocking a specific web crawler from a specific webpage
User-agent: Bingbot
Disallow: /example-subfolder/blocked-page.html
This syntax tells only Bing’s crawler (user-agent name Bing) to avoid crawling the specific page at www.example.com/example-subfolder/blocked-page.html.
How does robots.txt work?
Interaction with search engine crawlers
A robots.txt file acts as a set of instructions for web crawlers (bots) visiting a website. When a bot makes an HTTP request to a website, it first checks the robots.txt file, which tells search engines which sections of the site they can or cannot access. This interaction helps control how content is indexed and ensures that bots focus only on relevant areas of the site.
Use cases in SEO and site management
Robots.txt serves several practical purposes in managing a website and optimizing SEO performance:

User-agent identification
Each user-agent has a unique identifier, known as a user-agent string, which can be used to identify the type of browser or crawler. Recognizing these strings helps you tailor your robots.txt file to manage specific user agents.
Preventing unnecessary crawling
Site owners can prevent low-value content from crawling and indexing by disallowing bots from accessing specific directories or pages.
Managing server load
Limiting bot activity on large sites reduces server strain and ensures efficient resource allocation.
Focusing on important content
Directing bots toward high-value sections like an XML sitemap ensures that the most critical pages are indexed first.
Restricting access to sensitive files
Protect confidential or non-public HTML files and directories from being crawled.
Identify and fix robots.txt warnings
with Moz Pro Site Crawl

Syntax and core directives for robots.txt
Robots.txt files rely on straightforward directives to communicate instructions to web crawlers. These directives allow website owners to define rules that bots must follow. There are five standard terms you’re likely to come across in a robots file. They include:
User-agent
Specifies which bots the rule applies to (e.g., Googlebot, Bingbot, or all bots). A list of most user agents can be found here.
Disallow
Prevents crawlers like Googlebot from accessing specific files, pages, or directories. Only one “Disallow:” line is allowed for each URL.
Allow
Overrides a disallow rule, permitting access to a specific page or resource.
Crawl-delay
Controls how frequently bots access the server by introducing delays between requests. Googlebot does not acknowledge this command, but you can set the crawl rate in Google Search Console.
Sitemap
Directs crawlers to the location of the website’s sitemap for efficient crawling and indexing. Note this command is only supported by Google, Ask, Bing, and Yahoo.

Additionally, the 'meta name robots content' attribute can be used to assign different values, such as 'noindex' or 'nofollow,' to control how search engines handle the indexing and crawling of web pages.
How to create a robots.txt file
Step-by-step instructions to create a robots.txt file include:
File location and naming conventions
- The robots.txt file must be placed in the root directory of your website (e.g., https://www.example.com/robots.txt).
- Ensure the file is named robots.txt to be recognized by web crawlers.
Writing basic directives
- Open a plain text editor to create the file.
- Add rules using key directives like User-agent, Disallow, and Allow.
- Save the file as robots.txt and upload it to the root directory of your website.
Implementing robots meta tags within popular website platforms
On platforms like Wix and WordPress, you can implement robots meta tags to control how search engines index your pages.
- In WordPress, navigate to the 'Edit' section of your page or post, and use the 'Advanced' settings to add 'noindex' or 'nofollow' directives.
- In Wix, go to the 'SEO' settings of your page, and configure the robots meta tags under the 'Advanced SEO' options.
You can test your file with tools like Google’s robots.txt Tester or the Robots.txt Parser. To generate a robots.txt file, use an online tool like Yoast's robots.txt generator that includes pre-configured templates.
Best practices for robots.txt
To ensure your robots.txt file functions optimally, follow these best practices:
File location issues
Place robots.txt in the root directory (e.g., www.example.com/robots.txt).
Ensure correct syntax
Validate formatting with tools like Google’s Robots Testing Tool to avoid errors.
Don’t block CSS or JavaScript
Allow access to resources needed for proper rendering.
Use a sitemap directive
Link your sitemap to guide crawlers to essential content.
Monitor crawler behavior
Check server logs or analytics to confirm compliance with your rules.
Meta robot tags are also crucial in managing SEO and preventing issues like accidental noindex directives.
How to identify robots.txt issues in Moz Pro
Robots.txt files controls how search engines crawl your site. Errors in this file can block important pages from being indexed or allow access to areas that shouldn’t be crawled. Moz Pro’s On-Demand Crawl makes it easy to surface and fix these issues.
Step 1: Run an On-Demand Crawl
In your Moz Pro dashboard, go to On-Demand Crawl and enter the subdomain or URL you want to scan.

Once the crawl is completed, you’ll get a full report of technical SEO issues, including anything related to robots.txt directives.
Step 2: Check for critical crawl issues
In the Crawl Report, look under All Issues to view warnings like X-Robots Noindex and Meta Noindex.

Step 3: Filter for robots. txt-related issues
In the Pages Crawled section, use the Issue Types dropdown to filter for Meta Noindex and X-Robots Noindex. This will show you all pages affected, along with key metrics like crawl depth and page authority.

Step 4: Review and troubleshoot
Once you’ve identified affected URLs:
- Check your robots.txt file directly at yourdomain.com/robots.txt to review the rules
- Make sure important pages aren’t unintentionally blocked
- Adjust Disallow or Allow directives as needed
- Review the configurations in your CMS or server settings for pages blocked via meta tags or HTTP headers (like X-Robots-Tag).
Step 5: Rerun the crawl to confirm fixes
After making updates to your robots.txt or meta directives, run another crawl in Moz Pro. You should see those issues cleared or reduced in the next report. Ongoing monitoring helps ensure search engines can access and index the right pages.
Identify and fix robots.txt warnings
with Moz Pro Site Crawl

Common mistakes to avoid
Robots.txt files are simple but prone to mistakes that can negatively impact your website’s visibility or functionality:

Wrong file format
Save as a plain text file with UTF-8 encoding to ensure readability.
Overly restrictive rules
Avoid blocking critical directories or pages needed for SEO.
Skipping testing
Regularly test your robots.txt with tools like Google’s tester to ensure functionality.
Ignoring crawler differences
Tailor rules to the behavior of specific user agents.
Failing to update
Revise robots.txt as your website structure changes.
Can I block AI bots with robots.txt?
Yes, robots.txt can be used to exclude AI bots like ClaudeBot, GPTbot, and PerplexityBot. Many news and publication websites have already blocked AI bots. For instance, research by Moz’s Senior Search Scientist, Tom Capper shows that GPTbot is the most blocked bot. However, whether blocking AI bots is the right move for your site and whether this will be honored by all AI bots is still under review and subject to discussion.
How to block AI bots with robots.txt:
To block AI bots, enter their unique user-agent and the areas of your site that you would like to exclude. For example:
User-agent:GPTbot
Disallow: /blog
Disallow: /learn/seo

Identify and fix robots.txt warnings
with Moz Pro Site Crawl

FAQs about robots.txt
How do I check if I have a robots.txt file?
To check if your site has a robots.txt file, enter your root domain followed by /robots.txt. For example:
www.example.com/robots.txt
If no robots.txt file appears, your site doesn’t have a live file.
Is Robots.txt legally enforceable?
No, robots.txt is not legally enforceable. It operates as a voluntary protocol, meaning that while most well-behaved bots (like search engine crawlers) honor the directives, malicious bots or scrapers can choose to ignore it.
Is robots.txt still relevant for search results today?
Yes, robots.txt remains relevant today as an effective tool for managing bot access and prioritizing the crawling of critical website areas.
What is the difference between robots.txt vs meta robots tag vs X-Robots-Tag?
Robots.txt is a text file that dictates crawl behavior for an entire site or directory. Meta robots and X-robots are meta directives used for controlling indexation on individual pages or page elements.
Identify and fix robots.txt warnings
with Moz Pro Site Crawl
