Перейти к содержимому
Skip to content
← All articles

The Complete Guide to robots.txt for SEO and Crawl Control

The robots.txt file is a simple text file at your website's root that tells search engine crawlers which pages they can and cannot access. Despite its simplicity, misconfigurations can cause serious SEO damage — accidentally blocking your entire site from indexing is more common than you'd think.

How robots.txt Works

When a search engine crawler visits your site, it first checks https://example.com/robots.txt. The file contains directives that specify which paths are allowed or disallowed for each crawler (user-agent). Crawlers follow these rules voluntarily — robots.txt is a protocol, not a security measure.

Basic Syntax

# Allow all crawlers access to everything
User-agent: *
Allow: /

# Block all crawlers from /admin/
User-agent: *
Disallow: /admin/

# Block Googlebot from a specific directory
User-agent: Googlebot
Disallow: /private/

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Key Directives

Pattern Matching

Most crawlers support pattern matching:

Common Patterns

Block Internal Search Results

User-agent: *
Disallow: /search
Disallow: /*?q=
Disallow: /*?s=

Block URL Parameters

User-agent: *
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*&page=

Block Development/Staging Areas

User-agent: *
Disallow: /staging/
Disallow: /dev/
Disallow: /test/

Allow CSS/JS for Rendering

User-agent: *
Allow: /assets/css/
Allow: /assets/js/
Allow: /assets/images/

robots.txt vs meta robots vs X-Robots-Tag

MethodScopePrevents CrawlingPrevents Indexing
robots.txtEntire paths/directoriesYesNo (indirectly)
meta robotsIndividual pagesNoYes (noindex)
X-Robots-TagAny URL (header)NoYes (noindex)

Important: robots.txt prevents crawling, not indexing. If other sites link to a disallowed page, search engines may still index the URL (without content). Use noindex meta tag to prevent indexing.

Testing robots.txt

Common Mistakes

Best Practices

Conclusion

robots.txt is small but powerful. A well-configured file helps search engines crawl your site efficiently, focusing on valuable pages while skipping duplicates and internal tools. Always test changes, and remember: robots.txt controls crawling, not indexing.

Check your website right now

Check now →
More articles: SEO
SEO
XML Sitemap Guide: Creation, Structure, and Best Practices
16.03.2026 · 16 views
SEO
Web Accessibility: A Developer's Practical Guide to WCAG Compliance
16.03.2026 · 14 views
SEO
Structured Data for SEO: Schema.org Guide
16.03.2026 · 9 views
SEO
Website Migration Checklist: Avoid SEO and Downtime Pitfalls
16.03.2026 · 12 views