What Is Robots Exclusion Protocol? (robots.txt 파일)

robots.txt is a text file placed at the root of your website (/robots.txt). It tells search engine crawlers which paths they can crawl and which to avoid. It is a core element of technical SEO for managing crawl budget.

How It Works

robots.txt declares rules per User-agent. The key directives are below.

Directive	Function
`User-agent`	Specifies the target crawler for the rules
`Disallow`	Paths blocked from crawling
`Allow`	Exceptions permitted within a blocked path
`Sitemap`	Points to the sitemap location

Crawlers read this file first when visiting a site. Incorrect rules immediately affect all crawling and indexing.

Practical Uses

Block paths with no index value, such as admin pages and internal search results
Suppress crawling of duplicate parameter URLs to save crawl budget
Specify the sitemap path to speed up index discovery

After blocking, verify the behavior with the robots.txt tester in Google Search Console.

Common Misconceptions and Cautions

A robots.txt block only stops crawling; it does not fully prevent indexing. If another site links to the page via a backlink, the URL alone can still be indexed. To reliably block indexing, use the noindex meta tag. For this to work, the page must be crawlable so the noindex directive can be read.

Blocking CSS and JS can break rendering evaluation
Accidentally blocking important pages can wipe out traffic entirely
It is unsuitable for protecting private information (the file itself is public)

Note

In the AI era, a similar concept called llms.txt has emerged for generative engines. 238lab reviews robots.txt design from both an SEO and GEO perspective, ensuring that crawl blocking and AI exposure strategy do not conflict.

How It Works

Practical Uses

Common Misconceptions and Cautions

Note

More in SEO Terms