Glossary

Robots.txt

A text file placed in a website's root directory that instructs search engine crawlers which pages or sections they can or cannot access.

Robots.txt is a plain-text file stored at the root of a domain (example.com/robots.txt) that contains directives telling crawlers how to behave on a site. It uses simple syntax to allow or disallow specific user agents (like Googlebot) from crawling particular paths, files, or the entire site. It can also specify the location of a sitemap and set crawl delays.

Robots.txt is not a security mechanism—it's a *request*, not a barrier. Any crawler can ignore it, and the file itself is publicly readable, so sensitive information should never be placed there. It's useful for preventing crawlers from wasting bandwidth on duplicate content, staging environments, or non-indexable pages like login forms or internal search results.

SEOs use robots.txt to manage crawl budget and protect resources, but it should never be the only way to keep pages out of search indexes. For actual de-indexing, use noindex meta tags or HTTP headers. Misconfigured robots.txt—especially one that blocks Googlebot from accessing stylesheets or JavaScript—can prevent Google from rendering pages properly and hurt SEO performance.