Skip to content
← All articles

robots.txt and AI Bots: GPTBot, ClaudeBot, Google-Extended

Short answer. AI crawlers (OpenAI's GPTBot, Anthropic's ClaudeBot, PerplexityBot, Google-Extended, CCBot) respect robots.txt just like search bots do. Using User-agent plus Allow/Disallow directives, you decide which bots may read your content for training and AI answers, and which to block. The choice depends on strategy: openness for citations, or protection of your content.

What AI bots are and why they want your site

AI crawlers gather content for two purposes: training models and building answers in real time (RAG, AI search). Open access increases the chance your brand gets cited in ChatGPT, Claude or Perplexity. Closed access protects unique content from being used without attribution.

robots.txt is an agreement, not a technical barrier. Well-behaved bots (GPTBot, ClaudeBot) honor it. For hard blocking you need server-side rules or a WAF.

Table: major AI crawlers

User-agentWhoPurpose
GPTBotOpenAITraining GPT models
OAI-SearchBotOpenAIIndexing for ChatGPT Search
ClaudeBotAnthropicTraining and indexing for Claude
PerplexityBotPerplexityAI search and answers
Google-ExtendedGoogleGemini training (does not affect Search)
CCBotCommon CrawlOpen dataset used by many models

Example: open everything to AI bots

If your strategy is maximum citability, allow crawlers access to all public content:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://enterno.io/sitemap.xml

Don't forget the Sitemap: line — it helps bots discover every page.

A common strategy: close training crawlers but keep live AI-search bots open so you still get cited.

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://enterno.io/sitemap.xml

Here GPTBot and Google-Extended (training) are blocked, while OAI-SearchBot and PerplexityBot (live search) are allowed.

The Content-Signal directive

A newer IETF initiative is the Content-Signal directive, which declares permitted uses of your content: search, AI training, AI input. It's a more granular tool than a blunt Disallow.

User-agent: *
Content-Signal: search=yes, ai-train=no, ai-input=yes
Allow: /

In this example search is allowed, model training is not, and use as context for an AI answer (ai-input) is. Support depends on the bot.

Don't block useful paths by accident. Closing /API документацию/ or /assets/ to AI bots is unnecessary, and a stray Disallow: / inside a wildcard User-agent: * block will lock out everyone.

What to always block

  • Private areas: /admin/, /dashboard/, login pages.
  • Utility paths: internal APIs, cart, parameterized search.
  • Duplicates: pages with UTM tags and session parameters.

These rules are the same for search and AI bots. We cover robots.txt fundamentals in the robots.txt guide.

How to validate and strengthen

After editing robots.txt, check the syntax and make sure the bots you want aren't blocked by accident. Complement the file with a content map — see the llms.txt guide — and a correct sitemap.xml. A free tool can assess your site's readiness for AI crawlers end to end.

FAQ

Do AI bots obey robots.txt?

The major well-behaved crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot) do. To reliably block bad actors you need server-side rules.

No. Google-Extended only governs use for Gemini and has no effect on Google Search indexing — that's a separate Googlebot.

Should I block all AI bots?

It depends on your goals. Blocking protects content but costs you citations in AI answers, a growing channel for traffic and brand awareness.

What about CCBot?

CCBot builds Common Crawl, an open dataset many models train on. Whether to allow it depends on your policy toward training data.

Does Content-Signal work today?

It's an evolving initiative; support depends on the bot. Adding the directive is safe — bots that don't support it simply ignore it.

Check your site's AI readiness →

Check your website right now

Check your site →
More articles: SEO
SEO
Markdown Content Negotiation for AI Agents
15.06.2026 · 30 views
SEO
Website AI-Readiness Checklist 2026
15.06.2026 · 37 views
SEO
Website SEO Audit: 20-Point Checklist
14.03.2026 · 147 views
SEO
How to Get Cited by ChatGPT and Perplexity
15.06.2026 · 38 views