In brief
RAG (Retrieval Augmented Generation) is the system that allows an LLM to consult external sources before generating its response. Without RAG, the model responds only from what it memorised during training. With RAG, it searches, retrieves and cites passages from relevant documents in real time. Perplexity, ChatGPT Search, Google AI Overviews, Bing Copilot - all use a form of RAG to produce cited responses. Optimising your content for RAG means optimising to be selected during the retrieval step.
1. How RAG works
A RAG system follows a four-step pipeline:
- Indexation. Documents (web pages, PDFs, databases) are cut into chunks (segments of 100 to 500 tokens) and transformed into numerical vectors (embeddings) that represent their meaning. These vectors are stored in a vector database.
- Retrieval. When a user asks a question, the query is transformed into a vector. The system looks for chunks whose vector is closest ("semantic neighbours"). It selects the 3 to 10 most relevant.
- Augmentation. The selected chunks are injected into the LLM context ("prompt"), with the user question. The model therefore receives: [question] + [source passages].
- Generation. The LLM produces a response that draws on the injected passages. In engines that display citations (Perplexity, ChatGPT Search), each claim is associated with its source.
What determines whether your content is cited is primarily the retrieval step. If your chunk is not selected at that stage, regardless of the quality of your page: it will not exist in the final response.
2. Two types of RAG to know
2.1 Web RAG (real-time)
Used by Perplexity, ChatGPT Search, Bing Copilot and surfaces with active retrieval. The system performs a real web search on each query, crawls pages in the top results, chunks and semantically evaluates passages in a few seconds. Freshness and indexability (robots.txt, HTML rendering) are critical here.
2.2 Memorised RAG (training + private index)
Used by Claude, ChatGPT without Search, Gemini (standard mode). The LLM responds from its parametric memory - the information absorbed during training. Here, visibility depends on having been crawled and included in training data (Common Crawl, C4, proprietary data). Cannot be controlled directly, but influenced by site popularity and training bot crawl frequency.
Practical implication: optimising for web RAG (Perplexity, ChatGPT Search) gives faster and more measurable results than optimising for LLM memory. That is where to concentrate efforts in 2026.
3. What retrieval evaluates in your content
3.1 Semantic proximity
Vector retrieval measures the cosine distance between the query vector and the vector of each chunk. The closer a chunk is semantically to the question, the more chance it has of being selected.
Practical consequence: cover the exact vocabulary of the query in your content. Not keyword stuffing, but content that uses the same terms and synonyms that users employ when searching. An article on "generative engine optimisation" that never uses the word "GEO" or "AEO" will be sub-optimal for queries including those terms.
3.2 Informational density of the chunk
A chunk that contains a single precise and complete piece of information is preferred over a chunk containing five vague pieces of information. RAG systems also evaluate the internal coherence of the passage: a chunk that jumps between three different subjects has a diluted, less semantically precise embedding.
3.3 Self-containment
A chunk injected into an LLM prompt is read without its original context. If the passage says "this technique can increase citation rate" without naming the technique, the LLM cannot use this information. Each passage must be understandable on its own.
3.4 Factual precision
LLMs favour passages that contain verifiable claims: figures, dates, proper nouns, URLs, references to standards or norms. A vague claim ("many sites") is less exploitable than a precise claim ("according to the Princeton GEO study 2023, content enriched with citations is 30% more often picked up").
4. Chunking: understanding how your page is cut
RAG systems automatically cut pages into chunks. Two chunking strategies coexist:
- Fixed chunking (by tokens). The page is cut every N tokens (often 200-300), with an overlap of 20-50 tokens to avoid cutting a sentence in the middle. Simple but blind to semantic structure.
- Structured chunking (by HTML tags / sections). The page is
cut according to its HTML tags (
h2,h3,p,li). Each section becomes a chunk. More intelligent, and strongly favoured by systems that read HTML.
Implication for the writer: structuring content into clear HTML sections (one h2 = one topic = one potential chunk) favours structured chunking and improves embedding quality. Tables, lists and code blocks are often treated as autonomous chunks, which is an advantage if their content is self-contained.
5. What web RAG does not read (or reads poorly)
Certain page elements are systematically ignored or poorly handled in the web RAG pipeline:
- Content rendered only in JavaScript: an SPA without SSR returns empty HTML to the crawler. The chunk will be empty or unusable.
- Images without alt text: an infographic with key data but no alternative text is invisible to retrieval.
- Embedded videos without a transcript: audio/video content is not indexed by retrieval crawlers.
- PDFs without extractable text: PDFs that are scans or image PDFs cannot be parsed.
- Navigation menus: RAG systems attempt to filter navigation content to keep only editorial content, but navigational content integrated into the body can be incorrectly captured.
6. Concrete levers for RAG-ready content
6.1 Structure in self-contained sections
Each h2/h3 section of your page must be able to function as an independent chunk. Practical rule: if you could copy this paragraph into a tweet without context, is it still understandable? If not, add an introductory sentence that explicitly names the subject.
6.2 Favour definitions at the start of sections
RAG systems work better when the subject is announced at the head of the chunk. Preferred format: "[Term] is [complete definition]. It works by [mechanism]. The main use cases are [list]." This pattern mimics the structure of an encyclopaedia entry, and LLMs have been trained on billions of entries of this type.
6.3 Include sourced quantitative data
Passages with precise figures and named sources have a more discriminating embedding (fewer web pages contain exactly that figure), hence a better retrieval position on relevant queries. Recommended format:
"According to [source], [precise fact with figure], measured in [year]."
6.4 Add Article schema with dateModified
Web RAG systems consult schema.org metadata to validate
the freshness of a document. An article with dateModified: 2026-04-22
will be preferred over an undated article or one dated 2021 on queries
with currency intent (type "in 2026", "currently").
6.5 Enable server rendering (SSR)
If your site is in React, Next.js or Vue, make sure the editorial
content is server-rendered and present in the initial HTML.
A curl https://your-site.com/page/ must return the text
of the page, not an empty DOM. That is the simplest test to verify
the RAG eligibility of your page.
6.6 Cover terminological variants of your subject
Semantic retrieval does not rely only on exact words, but covering synonyms and acronyms improves the overall embedding. On an article about RAG, including "retrieval augmented generation", "RAG", "search augmented generation", "vector database", "embeddings" in the same document improves semantic proximity for all these queries.
7. RAG and glossary: a powerful combination
Glossary pages are ideal RAG candidates. Why? Because they contain exactly what RAG seeks: a concise, self-contained definition of a precise term. A query like "what is RAG" will almost systematically trigger retrieval of a well-structured glossary page rather than a long article.
Recommendation: create a dedicated glossary page for each central technical term in your domain. Each glossary entry should contain: formal definition, synonyms, difference from adjacent terms, and a concrete example. This format is directly compatible with structured chunking.
8. Measuring RAG eligibility of your pages
There is no native "RAG score" tool. The most useful proxies:
- Manual Perplexity test. Ask a very specific query that your page should cover. Do you appear in the sources? If not, which sources are cited and why are they better structured?
- Self-containment audit. Take 5 random paragraphs from your key pages. Read each one out of context. Count how many make sense without the rest. Target: 4/5 minimum.
- HTML analysis.
curl -s https://your-page.com/ | grep -c "<p>"gives the number of p tags in the initial HTML. If result = 0, the page is not server-rendered. - Third-party tools. Profound, Otterly and Scrunch offer citation reports that indirectly reveal if your pages pass the retrieval filtering of Perplexity and ChatGPT.
RAG-ready checklist
- Content server-rendered (SSR), verifiable via curl
- Each h2/h3 section is self-contained (understandable without context)
- No orphan pronouns at the start of sections ("it", "this method" without a named referent)
- At least 3 sourced quantitative data points in the article
- Key terms and synonyms present in the body text
- Article schema with datePublished + dateModified up to date
- Tables and lists with explicit headings (not "see below")
- Strategic images with descriptive and factual alt text
FAQ
Is RAG used by all AI engines?
Yes, with variants. Perplexity and ChatGPT Search use web RAG (real-time retrieval from online sources). Pure LLMs like Claude or GPT-4 without plugins use internal RAG (retrieval from the parametric memory of the model). Google AI Overviews combines both. The retrieval + generation principle is universal.
Is well-structured content enough to be selected by RAG?
Structure is necessary but not sufficient. Retrieval first selects by semantic relevance (does your content cover the query?) then by passage quality (is it self-contained, factual, precise?). Perfectly structured but vague content will not be cited.
What is the ideal passage size for RAG?
RAG systems cut documents into chunks of 100 to 500 tokens (75 to 375 words). Aim for paragraphs of 80 to 150 words per section, each answering a precise intent. Sections that are too long dilute relevance; sections that are too short lack context.
Does RAG take PageRank or domain authority into account?
Not directly in semantic scoring. But web RAG systems (Perplexity, ChatGPT Search) first filter by sources present in an underlying search engine (often Bing). A domain with weak Bing authority will simply be absent from retrieval candidates, before even semantic evaluation.
How do I know if my content is eligible for RAG?
Practical test: take a paragraph from your page, read it without context and ask yourself whether an LLM could use it to answer a specific question. If it contains a complete, sourceable and verifiable claim, it is eligible. If it uses orphan pronouns or implicit references ("this method" without naming which one), it is not.