Does a Disallow on GPTBot block ChatGPT Search?

No. GPTBot and OAI-SearchBot are two distinct bots. GPTBot crawls to feed OpenAI model training. OAI-SearchBot crawls for ChatGPT Search. A Disallow on GPTBot leaves OAI-SearchBot free to crawl. You must add a separate rule for each bot according to your objectives.

Is robots.txt the only way to block AI bots?

No. You can also use the meta robots tag ( ), the HTTP X-Robots-Tag header, or WAF/Cloudflare rules by user-agent. The noai meta tag is recognised by some bots (notably training crawlers) but not all. robots.txt remains the most universal and easiest signal to maintain.

How do you verify that your robots.txt rules are being applied?

Test via Google Search Console (robots.txt testing tool), via curl simulating the user-agent (curl -A "GPTBot" https://yoursite.com/robots.txt), and via server logs to verify that bots respect your directives. Note: server logs show bot requests, but a poorly configured or malicious bot can ignore robots.txt.

Should you use a Crawl-delay for AI bots?

Only if your server is under pressure. GPTBot, OAI-SearchBot and PerplexityBot generally respect Retry-After headers and 429 (Too Many Requests) codes. Adding a Crawl-delay of 10-30 seconds can reduce load without blocking crawling. Note: Google does not recognise the Crawl-delay directive; for Googlebot, use the crawl rate parameter in GSC.

robots.txt and AI bots: complete configuration guide 2026

Complete reference: AI bot user-agents in 2026

Each AI company deploys multiple bots with distinct roles. Here is the complete reference of user-agent strings to know:

User-agent	Company	Role	Impact if blocked
GPTBot	OpenAI	Model training	Exclusion from future OpenAI corpora
OAI-SearchBot	OpenAI	ChatGPT Search (retrieval)	Not cited in ChatGPT Search
ChatGPT-User	OpenAI	ChatGPT browsing (plugins)	No ChatGPT browsing on your site
PerplexityBot	Perplexity	Perplexity indexation + retrieval	Not cited in Perplexity
Perplexity-User	Perplexity	Perplexity user queries	Reduced Perplexity visibility
ClaudeBot	Anthropic	Claude training + retrieval	Exclusion from Anthropic corpus
Claude-Web	Anthropic	Claude web browsing	No Claude browsing on your site
anthropic-ai	Anthropic	Generic Anthropic crawler	Exclusion from Anthropic corpus
Google-Extended	Google	Gemini training	Exclusion from Gemini corpus (not SERPs)
Applebot-Extended	Apple	Apple Intelligence training	Exclusion from Apple Intelligence corpus
CCBot	Common Crawl	Open source corpus (used by many LLMs)	Exclusion from many open-source LLM corpora
cohere-ai	Cohere	Cohere model training	Exclusion from Cohere corpus
meta-externalagent	Meta	Llama / Meta AI training	Exclusion from Meta corpus
Bytespider	ByteDance	ByteDance model training	Exclusion from ByteDance corpus

The 4 standard robots.txt configurations

Configuration 1 - Allow everything (maximum visibility strategy)

No specific directives for AI bots: they follow the general rules of your robots.txt. Recommended if your objective is maximum visibility across all LLMs and AI engines.

User-agent: *
Disallow:

# Sitemap
Sitemap: https://yoursite.com/sitemap.xml

Configuration 2 - Block training, allow retrieval

Block training bots (GPTBot, Google-Extended, CCBot, meta-externalagent, Bytespider) while allowing real-time retrieval bots (OAI-SearchBot, PerplexityBot). You keep visibility in ChatGPT Search and Perplexity without feeding training corpora.

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Retrieval allowed
User-agent: OAI-SearchBot
Disallow:

User-agent: ChatGPT-User
Disallow:

User-agent: PerplexityBot
Disallow:

User-agent: Perplexity-User
Disallow:

User-agent: ClaudeBot
Disallow:

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

Configuration 3 - Block everything (defensive strategy)

Block all known AI bots. To use only if you have strong legal or commercial reasons (proprietary content, copyright, direct competition with LLMs). Impact: near-absence from LLM and AI engine responses.

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

Configuration 4 - Selective folder blocking

Allow general crawling but block specific sections (paid content, proprietary data, archives). Useful for media outlets and SaaS with a public and a private part.

User-agent: GPTBot
Disallow: /premium-content/
Disallow: /proprietary-data/
Disallow: /app/

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

Common configuration pitfalls

Pitfall 1 - Confusing GPTBot and OAI-SearchBot

This is the most common error. A site that blocks GPTBot thinking it is blocking ChatGPT Search has only blocked OpenAI training. OAI-SearchBot continues to crawl freely. Verify that your rules target the right user-agents for your actual objectives.

Pitfall 2 - Rule order in robots.txt

Bots respect the first matching rule for their user-agent. If you have a User-agent: * Disallow: / at the top of the file, it will take priority over specific rules that follow for bots that do not match a precise user-agent. Always put specific rules before the * rule.

Pitfall 3 - Case sensitivity in user-agents

User-agent strings in robots.txt are case-sensitive. GPTBot (capital P) is different from gptbot. Always use user-agents in the official case published by each company (reference in the table above).

Pitfall 4 - Forgetting Crawl-delay for aggressive bots

Some less well-behaved bots (notably CCBot and Bytespider) may ignore Crawl-delay directives. For bots that respect them, a value of 10 to 30 seconds reduces server load without blocking the crawl. For bots that ignore this directive, a WAF rule (Cloudflare) by user-agent is more effective.

Pitfall 5 - Not updating robots.txt after new bots

New AI bots appear regularly. In 2025, Amazon Alexa AI, Grok (xAI), and several open-source LLM crawlers were deployed. Check and update your robots.txt quarterly by consulting official announcements from major AI companies.

Verifying and testing your configuration

Test via curl

Simulate the user-agent of each bot to verify what it sees:

# Test as GPTBot
curl -A "Mozilla/5.0 AppleWebKit/537.36 (compatible; GPTBot/1.2; +https://openai.com/gptbot)" https://yoursite.com/robots.txt

# Test as PerplexityBot
curl -A "PerplexityBot/1.0" https://yoursite.com/robots.txt

Test via Google Search Console

The robots.txt testing tool in GSC (Settings > robots.txt) lets you test any user-agent against your file. Paste the user-agent string and the URL to test.

Server log monitoring

Nginx/Apache/Cloudflare logs show requests from each bot with their actual user-agent. Filter with grep -i "gptbot\|oai-searchbot\|perplexitybot" to see their activity. This is also the method to detect bots that ignore your robots.txt.

FAQ - robots.txt and AI bots

Does a Disallow on GPTBot block ChatGPT Search?: No. GPTBot and OAI-SearchBot are two distinct bots. Blocking GPTBot leaves OAI-SearchBot free to crawl. You must target each bot separately according to your objectives.
Is robots.txt the only way to block AI bots?: No. The meta robots tag (noai, noimageai), the X-Robots-Tag header, and WAF/Cloudflare rules are alternatives. robots.txt remains the most universal and easiest signal to maintain.
How do you verify that your robots.txt rules are being applied?: Via GSC (robots.txt testing tool), via curl simulating the user-agent, and via server logs to confirm that bots respect your directives.
Should you use a Crawl-delay for AI bots?: Only if your server is under pressure. Well-configured bots (GPTBot, PerplexityBot) respect 429 and Retry-After. Note: Googlebot ignores Crawl-delay; use GSC parameters to regulate it.

robots.txt AI bots checklist (7 points)

The robots.txt configuration matches your strategy (max visibility, training only, or defensive).
GPTBot and OAI-SearchBot have separate rules if your objectives differ.
User-agent strings are in the correct case (GPTBot, OAI-SearchBot, PerplexityBot).
Specific rules precede the generic User-agent: * rule.
The file has been tested via GSC and/or curl for each relevant bot.
Server logs are configured to monitor AI bot activity.
A quarterly review is planned to incorporate new AI bots.