robots.txt is a text file placed at the root of your website (/robots.txt). It tells search engine crawlers which paths they can crawl and which to avoid. It is a core element of technical SEO for managing crawl budget.
How It Works
robots.txt declares rules per User-agent. The key directives are below.
| Directive | Function |
|---|---|
User-agent | Specifies the target crawler for the rules |
Disallow | Paths blocked from crawling |
Allow | Exceptions permitted within a blocked path |
Sitemap | Points to the sitemap location |
Crawlers read this file first when visiting a site. Incorrect rules immediately affect all crawling and indexing.
Practical Uses
- Block paths with no index value, such as admin pages and internal search results
- Suppress crawling of duplicate parameter URLs to save crawl budget
- Specify the sitemap path to speed up index discovery
After blocking, verify the behavior with the robots.txt tester in Google Search Console.
Common Misconceptions and Cautions
A robots.txt block only stops crawling; it does not fully prevent indexing. If another site links to the page via a backlink, the URL alone can still be indexed. To reliably block indexing, use the noindex meta tag. For this to work, the page must be crawlable so the noindex directive can be read.
- Blocking CSS and JS can break rendering evaluation
- Accidentally blocking important pages can wipe out traffic entirely
- It is unsuitable for protecting private information (the file itself is public)
Note
In the AI era, a similar concept called llms.txt has emerged for generative engines. 238lab reviews robots.txt design from both an SEO and GEO perspective, ensuring that crawl blocking and AI exposure strategy do not conflict.
