Skip to content
Skip to content

Robots.txt & Sitemap Validator

Check robots.txt syntax, crawl rules, and sitemap accessibility for any domain

About robots.txt

The robots.txt file tells search engine crawlers which URLs they can access on your site. It is used mainly to manage crawler traffic and prevent overloading your server. A well-configured robots.txt helps ensure search engines can discover and index your important content while keeping private areas protected.

Robots.txt Checker

The tool analyzes your site's robots.txt file, which controls search engine crawler access to pages. Rules for all user-agents, Allow/Disallow directives, Crawl-delay, and Sitemap links are checked. An incorrect robots.txt can lead to deindexing of important pages or exposing internal sections.

Common robots.txt mistakes include blocking CSS/JS files (breaking rendering for Google), missing Sitemap directive, using Allow/Disallow without leading slash, and conflicting rules for the same path. Our validator catches these issues and shows which URLs are blocked for each user agent.

Always test robots.txt changes before deploying to production — a single typo can deindex your entire site. After validation, check broken links to ensure blocked pages aren't linked from active content. Review your security headers to make sure sensitive paths are properly protected.

Frequently Asked Questions

What is robots.txt?

robots.txt is a text file at the root of a site that tells search bots which pages can or cannot be indexed. It is a recommendation, not a mandatory block — malicious bots may ignore it.

What is the difference between robots.txt and meta robots?

robots.txt blocks crawling (the bot will not visit the page). Meta robots (noindex) blocks indexing (the bot visits but does not add to index). For full blocking, both are needed. If robots.txt blocks a page, the bot will not see meta noindex.

How to correctly specify Sitemap in robots.txt?

Add the line Sitemap: https://example.com/sitemap.xml at the end of the file. The URL must be absolute. Multiple sitemaps can be specified. This helps bots find the sitemap faster.

What is Crawl-delay?

Crawl-delay is a robots.txt directive that sets a pause between bot requests in seconds. Yandex and Bing support it. Google ignores Crawl-delay — Google's crawl rate is configured in Search Console.

What are common robots.txt mistakes?

Common mistakes: blocking CSS/JS files (prevents rendering), Disallow: / (blocks entire site), missing file (bot considers everything allowed), blocking /api/ without Allow for /api/docs, incorrect User-agent capitalization.

How to check robots.txt?

Our tool analyzes syntax, checks file accessibility, finds conflicting rules, and warns about potential issues. You can also use Google Search Console to test specific URLs.