Robots.txt & Sitemap Validator

Check robots.txt syntax, crawl rules, and sitemap accessibility for any domain

TL;DR:

robots.txt checker parses your file and shows which User-agents are allowed/denied, highlights common errors (bad syntax, blocking important paths, CSS/JS in Disallow). Validates sitemap references and Clean-param directives for Yandex.

Robots.txt Checker

The tool analyzes your site's robots.txt file, which controls search engine crawler access to pages. Rules for all user-agents, Allow/Disallow directives, Crawl-delay, and Sitemap links are checked. An incorrect robots.txt can lead to deindexing of important pages or exposing internal sections.

Common robots.txt mistakes include blocking CSS/JS files (breaking rendering for Google), missing Sitemap directive, using Allow/Disallow without leading slash, and conflicting rules for the same path. Our validator catches these issues and shows which URLs are blocked for each user agent.

Always test robots.txt changes before deploying to production — a single typo can deindex your entire site. After validation, check broken links to ensure blocked pages aren't linked from active content. Review your security headers to make sure sensitive paths are properly protected.

robots.txt ParsingFull Allow/Disallow directive parsing

URL TestCheck if a specific URL is allowed for bot

Sitemap LinksAll Sitemap: directives in the file

AI CrawlersGPTBot, ClaudeBot, and other AI bots

Why teams trust us

Live

robots.txt check

any User-Agent

Sitemap

sitemap links

Free

no signup

How it works

Enter site URL

Parse robots.txt

Check crawl rules

Why check robots.txt?

robots.txt controls which pages search bots can see. Incorrect directives can accidentally block the entire site from indexing or expose administrative sections.

Full Parsing

Parse robots.txt per RFC 9309: all User-agent, Allow/Disallow, Crawl-delay, Sitemap.

URL Tester

Enter a specific URL and User-agent — find out if it's allowed for that bot.

AI Crawlers

Automatically show status for GPTBot, ClaudeBot, PerplexityBot, Googlebot.

Sitemap List

All Sitemap: directives in one place with quick links for verification.

Who uses this

SEO

crawl directive audit

Developers

post-deploy check

Marketers

indexation control

Site owners

block unwanted crawlers

Common Mistakes

❌

Disallow: / for entire siteThis blocks the entire site from indexing. Check robots.txt after every change.

❌

Blocking AI without understandingBlocking GPTBot removes your site from ChatGPT and Perplexity citations.

❌

Not specifying SitemapWithout a Sitemap: directive, bots must guess the sitemap URL. Always specify explicitly.

❌

Conflicting rulesAllow and Disallow on the same URLs for different User-agents create unpredictable behavior.

Best Practices

✓

Test after every changeOne wrong character in robots.txt can block an entire section from indexing.

✓

Use * carefullyUser-agent: * applies to all bots, including AI crawlers.

✓

Always specify SitemapSitemap: https://example.com/sitemap.xml helps bots find all pages.

✓

Verify with Google Search ConsoleGSC shows how Google sees your robots.txt, including parsing errors.

Get more with a free account

Robots.txt check history and change monitoring for your site.

Learn more

Glossary

Frequently Asked Questions

What is robots.txt?

robots.txt is a text file at the root of a site that tells search bots which pages can or cannot be indexed. It is a recommendation, not a mandatory block — malicious bots may ignore it.

What is the difference between robots.txt and meta robots?

robots.txt blocks crawling (the bot will not visit the page). Meta robots (noindex) blocks indexing (the bot visits but does not add to index). For full blocking, both are needed. If robots.txt blocks a page, the bot will not see meta noindex.

How to correctly specify Sitemap in robots.txt?

Add the line Sitemap: https://example.com/sitemap.xml at the end of the file. The URL must be absolute. Multiple sitemaps can be specified. This helps bots find the sitemap faster.

What is Crawl-delay?

Crawl-delay is a robots.txt directive that sets a pause between bot requests in seconds. Yandex and Bing support it. Google ignores Crawl-delay — Google's crawl rate is configured in Search Console.

What are common robots.txt mistakes?

Common mistakes: blocking CSS/JS files (prevents rendering), Disallow: / (blocks entire site), missing file (bot considers everything allowed), blocking /api/ without Allow for /api/docs, incorrect User-agent capitalization.

How to check robots.txt?

Our tool analyzes syntax, checks file accessibility, finds conflicting rules, and warns about potential issues. You can also use Google Search Console to test specific URLs.

Save & track URLs you check Free account · 24/7 checks · alerts via Telegram, email, Slack — sign up to monitor any URL you test here.

Free Sign Up

Related guides

Longer-form reading on this topic from the knowledge base.

Automate this check

Set up continuous monitoring and get an alert when something breaks. No manual runs to remember.

Start free

🤖 Robots.txt & Sitemap Validator

URL Tester

Robots.txt Checker

Why teams trust us

How it works

Enter site URL

Parse robots.txt

Check crawl rules

Why check robots.txt?

Full Parsing

URL Tester

AI Crawlers

Sitemap List

Who uses this

SEO

Developers

Marketers

Site owners

Common Mistakes

Best Practices

Get more with a free account

Related tools

Learn more

Glossary

Frequently Asked Questions

Related guides

Automate this check

Related Articles

Website Migration Checklist: Avoid SEO and Downtime Pitfalls

XML Sitemap Guide: Creation, Structure, and Best Practices

Redirect Chains: How They Affect SEO and Speed

Start monitoring for free

Robots.txt & Sitemap Validator