238LAB Corp
SEO Terms

robots.txt 파일

Robots Exclusion Protocol


robots.txt is a text file placed at the root of your website (/robots.txt). It tells search engine crawlers which paths they can crawl and which to avoid. It is a core element of technical SEO for managing crawl budget.

How It Works

robots.txt declares rules per User-agent. The key directives are below.

DirectiveFunction
User-agentSpecifies the target crawler for the rules
DisallowPaths blocked from crawling
AllowExceptions permitted within a blocked path
SitemapPoints to the sitemap location

Crawlers read this file first when visiting a site. Incorrect rules immediately affect all crawling and indexing.

Practical Uses

  • Block paths with no index value, such as admin pages and internal search results
  • Suppress crawling of duplicate parameter URLs to save crawl budget
  • Specify the sitemap path to speed up index discovery

After blocking, verify the behavior with the robots.txt tester in Google Search Console.

Common Misconceptions and Cautions

A robots.txt block only stops crawling; it does not fully prevent indexing. If another site links to the page via a backlink, the URL alone can still be indexed. To reliably block indexing, use the noindex meta tag. For this to work, the page must be crawlable so the noindex directive can be read.

  • Blocking CSS and JS can break rendering evaluation
  • Accidentally blocking important pages can wipe out traffic entirely
  • It is unsuitable for protecting private information (the file itself is public)

Note

In the AI era, a similar concept called llms.txt has emerged for generative engines. 238lab reviews robots.txt design from both an SEO and GEO perspective, ensuring that crawl blocking and AI exposure strategy do not conflict.

More in SEO Terms

→ All Glossary