Check robots.txt syntax, crawl rules, and sitemap accessibility for any domain
robots.txt checker parses your file and shows which User-agents are allowed/denied, highlights common errors (bad syntax, blocking important paths, CSS/JS in Disallow). Validates sitemap references and Clean-param directives for Yandex.
The tool analyzes your site's robots.txt file, which controls search engine crawler access to pages. Rules for all user-agents, Allow/Disallow directives, Crawl-delay, and Sitemap links are checked. An incorrect robots.txt can lead to deindexing of important pages or exposing internal sections.
Common robots.txt mistakes include blocking CSS/JS files (breaking rendering for Google), missing Sitemap directive, using Allow/Disallow without leading slash, and conflicting rules for the same path. Our validator catches these issues and shows which URLs are blocked for each user agent.
Always test robots.txt changes before deploying to production — a single typo can deindex your entire site. After validation, check broken links to ensure blocked pages aren't linked from active content. Review your security headers to make sure sensitive paths are properly protected.
robots.txt controls which pages search bots can see. Incorrect directives can accidentally block the entire site from indexing or expose administrative sections.
Parse robots.txt per RFC 9309: all User-agent, Allow/Disallow, Crawl-delay, Sitemap.
Enter a specific URL and User-agent — find out if it's allowed for that bot.
Automatically show status for GPTBot, ClaudeBot, PerplexityBot, Googlebot.
All Sitemap: directives in one place with quick links for verification.
crawl directive audit
post-deploy check
indexation control
block unwanted crawlers
User-agent: * applies to all bots, including AI crawlers.Sitemap: https://example.com/sitemap.xml helps bots find all pages.Robots.txt check history and change monitoring for your site.
Sign up freerobots.txt is a text file at the root of a site that tells search bots which pages can or cannot be indexed. It is a recommendation, not a mandatory block — malicious bots may ignore it.
robots.txt blocks crawling (the bot will not visit the page). Meta robots (noindex) blocks indexing (the bot visits but does not add to index). For full blocking, both are needed. If robots.txt blocks a page, the bot will not see meta noindex.
Add the line Sitemap: https://example.com/sitemap.xml at the end of the file. The URL must be absolute. Multiple sitemaps can be specified. This helps bots find the sitemap faster.
Crawl-delay is a robots.txt directive that sets a pause between bot requests in seconds. Yandex and Bing support it. Google ignores Crawl-delay — Google's crawl rate is configured in Search Console.
Common mistakes: blocking CSS/JS files (prevents rendering), Disallow: / (blocks entire site), missing file (bot considers everything allowed), blocking /api/ without Allow for /api/docs, incorrect User-agent capitalization.
Our tool analyzes syntax, checks file accessibility, finds conflicting rules, and warns about potential issues. You can also use Google Search Console to test specific URLs.
Longer-form reading on this topic from the knowledge base.
Set up continuous monitoring and get an alert when something breaks. No manual runs to remember.