What is the difference between GPTBot and OAI-SearchBot?

GPTBot crawls to feed the training of OpenAI models. OAI-SearchBot crawls for ChatGPT Search, the real-time search feature integrated in ChatGPT. They are two distinct bots with different crawl policies. Blocking GPTBot does not prevent ChatGPT Search from citing you, and vice versa.

Does ChatGPT Search cite sources like Perplexity?

Yes, but with a different philosophy. Perplexity systematically displays its sources and builds its answer almost exclusively from real-time retrieval. ChatGPT Search combines the memory of the GPT-4o model with selective retrieval: cited sources are generally fewer but selected with more internal arbitration. Citation is not guaranteed even if your page is crawled.

Should you allow OAI-SearchBot in robots.txt?

If you want to be cited in ChatGPT Search, yes. OAI-SearchBot is the retrieval bot for ChatGPT Search; blocking it means you are not a citation candidate. Note that ChatGPT-User is another distinct bot that executes requests on demand from users via plugins; both should be allowed for maximum visibility.

Does ChatGPT Search favour certain types of sites?

Based on observations available in 2025-2026, ChatGPT Search more frequently cites sites with high organic traffic, sources considered authoritative in their sector (press, institutions, recognised publishers), and pages with clean structured data. Specialist sites with strong topical authority on a narrow domain outperform generalists.

Can you measure your visibility in ChatGPT Search?

Not directly via an official console in 2026. Available proxies: referral traffic from chatgpt.com in Google Analytics or Plausible, OAI-SearchBot log entries in GSC, and manual monitoring via third-party tools such as Profound, AthenaHQ or Peec which regularly query LLMs on target queries.

Optimising for ChatGPT Search: technical guide 2026

ChatGPT Search: what exactly are we talking about?

ChatGPT Search is the web search feature integrated into ChatGPT, progressively deployed since November 2024. When a user asks a question requiring recent information, ChatGPT can trigger web retrieval via the OAI-SearchBot bot, then synthesise the results citing sources with clickable reference numbers.

It is important to distinguish the three OpenAI bots:

GPTBot: crawls to feed OpenAI model training. Blocking it does not protect against ChatGPT Search.
OAI-SearchBot: crawls for ChatGPT Search - the bot to allow if you want to be cited.
ChatGPT-User: executes requests on demand from users via ChatGPT browsing. Useful to allow as a complement.

Check your robots.txt: if you have a Disallow: / on GPTBot, it does not apply to OAI-SearchBot. Each bot must be managed separately.

ChatGPT Search architecture: retrieval + model memory

The main difference from Perplexity is architectural. Perplexity is almost exclusively RAG-based: its answer is built from real-time retrieval. ChatGPT Search works differently: GPT-4o has a dense knowledge base (training up to mid-2024) and decides, during generation, whether additional retrieval is needed.

This hybrid architecture has important practical consequences:

Cited sources are fewer than in Perplexity (often 2 to 4 versus 6 to 10) because the model completes the answer with its own memory.
Retrieval is triggered selectively, primarily for recent data, prices, ongoing events, and time-sensitive statistics.
Competition for citation is more intense: fewer available slots in the answer means more rigorous selection.

Source selection criteria for ChatGPT Search

1. Accessibility to OAI-SearchBot

A necessary but not sufficient condition. Your page must be crawlable by OAI-SearchBot, server-rendered (SSR/SSG, not an SPA without server rendering), and return a stable 200. Content behind a login, paywall or heavy JavaScript will not be indexed.

2. Domain authority in the sector

ChatGPT Search evaluates the thematic authority of the domain. Observed signals: high Domain Rating (Ahrefs) or Domain Authority (Moz), significant organic traffic, presence in adjacent reference domains. A site with narrow but deep topical authority (expert on a specific domain) outperforms generalists with the same DR.

3. Content freshness

ChatGPT Search favours recently updated pages for queries where freshness is critical. Signals: dateModified in the Article schema, <meta name="last-modified"> tag, HTTP Last-Modified header, and the visible date in the content. All three must be consistent.

4. Self-contained sections

Like any RAG system, ChatGPT Search chunks content and selects the most relevant passages. A page where each section can be understood out of context (self-contained) increases the probability that a chunk will be selected and cited. A section that starts with "As mentioned above..." is an unusable chunk.

5. E-E-A-T signals and structured data

Article schema with author, datePublished and dateModified. Organization schema with sameAs. These structured signals are the E-E-A-T proxy signals most directly readable by a retrieval system.

6. Factual density

ChatGPT Search is more selective than Perplexity on factual density: it prefers sources that bring figures, dates, names, precise definitions. Generic content ("it is important to note that...", "several factors come into play...") has little chance of being selected when a more factual source is available.

ChatGPT Search vs Perplexity: comparison table

Dimension	ChatGPT Search	Perplexity
Architecture	Model + selective RAG	RAG-first, near-exclusive
Sources cited per answer	2 to 4 on average	6 to 10 on average
Retrieval bot	OAI-SearchBot	PerplexityBot
Retrieval triggering	Selective (freshness, recent facts)	Systematic
Source bias	Towards known domains, high authority	More open to specialist sources
Freshness sensitivity	Very high for recent facts	High (real-time default)
Visibility measurement	chatgpt.com traffic + third-party monitoring	perplexity.ai traffic + third-party monitoring

6 optimisation levers for ChatGPT Search

Lever 1 - Allow OAI-SearchBot and ChatGPT-User

Check your robots.txt. Lines to add if absent:

User-agent: OAI-SearchBot
Disallow:

User-agent: ChatGPT-User
Disallow:

An empty Disallow: means "everything is allowed". Also verify that these bots are not blocked by a WAF or Cloudflare in bot-fight mode.

Lever 2 - Strict server rendering

ChatGPT Search does not execute JavaScript for main content rendering. Your pages must return text content in the initial HTML (SSR or SSG). Test with curl -A "OAI-SearchBot" on a URL: if the response HTML contains your content, you are correctly served.

Lever 3 - Visible and consistent freshness

Update dateModified in your Article schema on each significant revision. Display the last update date visibly within the article. Ensure that the HTTP Last-Modified header is consistent with the schema date.

Lever 4 - Self-contained sections with factual headings

Each H2/H3 section must function as an autonomous answer. The section heading must include the key concept ("OAI-SearchBot", "ChatGPT Search retrieval") so that the selected chunk can be used directly. End each section with an actionable conclusion or a key figure.

Lever 5 - Targeted factual density

Include in each page at least 3 to 5 factual claims with precise figures, dates or concrete examples. These should not be filler; each fact must be sourceable. "High factual density" content is what ChatGPT Search seeks to cite in order to credibilise its answers.

Lever 6 - Build domain thematic authority

ChatGPT Search favours domains recognised in their sector. Two priority actions: publish reference content regularly on your specialty (depth topical authority), and earn mentions and backlinks from recognised thematic sources that appear in OpenAI training corpora.

Measuring your visibility in ChatGPT Search

In the absence of an official ChatGPT Search console, available proxies in 2026:

Referral traffic from chatgpt.com: in Google Analytics 4 or Plausible, segment source = chatgpt.com. This traffic is under-estimated (many users copy-paste without clicking) but gives a measurable floor.
OAI-SearchBot logs via server logs: Google Search Console does not show third-party bots, but server logs (Nginx/Apache/Cloudflare) show OAI-SearchBot requests with the crawled URLs.
Active LLM monitoring: tools such as Profound, AthenaHQ, Peec and Otterly regularly query ChatGPT on target queries and detect when your site is cited. This is the most direct method even if it remains token-intensive.
Regular manual testing: each week, ask ChatGPT 5 to 10 queries from your sector with the search feature enabled (the globe icon). Note whether your domain appears in the cited sources.

FAQ - ChatGPT Search and optimisation

What is the difference between GPTBot and OAI-SearchBot?: GPTBot crawls to feed OpenAI model training. OAI-SearchBot crawls for ChatGPT Search, the real-time search feature. They are two distinct bots. Blocking GPTBot does not prevent ChatGPT Search from citing you.
Does ChatGPT Search cite sources like Perplexity?: Yes, but less systematically. Perplexity is almost exclusively RAG-based. ChatGPT Search combines model memory with selective retrieval: cited sources are fewer (2 to 4 vs 6 to 10) but selected with more arbitration.
Should you allow OAI-SearchBot in robots.txt?: If you want to be cited in ChatGPT Search, yes. OAI-SearchBot is the retrieval bot for ChatGPT Search. Blocking it means you are not a citation candidate.
Does ChatGPT Search favour certain types of sites?: Based on observations available in 2026, ChatGPT Search more frequently cites sites with high organic traffic, authoritative sources in their sector, and pages with clean structured data. Narrow specialists outperform generalists.
Can you measure your visibility in ChatGPT Search?: Not directly. Available proxies: referral traffic from chatgpt.com, OAI-SearchBot logs, active LLM monitoring (Profound, AthenaHQ, Peec), and regular manual testing with the ChatGPT search feature enabled.

ChatGPT Search checklist (8 points)

OAI-SearchBot and ChatGPT-User are allowed in robots.txt and not blocked by WAF.
Key pages are rendered SSR or SSG; content is in the initial HTML.
dateModified in Article schema is up to date and consistent with visible date and HTTP Last-Modified header.
Each H2/H3 section is self-contained and starts with the key concept (not "as mentioned above").
Each page contains at least 3 factual claims with figures, dates or concrete examples.
Article schema is implemented with author, datePublished and dateModified.
ChatGPT Search visibility monitoring is in place (referral traffic + LLM monitoring).
Domain topical authority is built via a cluster of reference content on the main subject.