robots.txt is a text file at the domain root (/robots.txt) telling search bots which URLs to crawl and which to skip. Robots Exclusion Protocol (REP, formalized in RFC 9309, 2022). Important: robots.txt is a *recommendation*, not a hard block. Malicious bots ignore it. For real prevention, use auth/firewall.
Below: details, example, related terms, FAQ.
User-agent: *
Disallow: /admin/
Disallow: /*.pdf$
Allow: /admin/public/
Sitemap: https://example.com/sitemap.xmlGoogle keeps crawling as usual, Yandex too. But if robots.txt returns 5xx — Google halts crawl for 12 hours. Serve a 200 or 404.
No. Disallow blocks *crawling* but not *indexing* (an external link can make the URL appear in the index without content). For indexing use meta noindex.
In major bots yes: <code>Disallow: /*.pdf$</code>. RFC 9309 formalized wildcards.