Content

Content structure for LLMs

Good AI-engine content isn't shorter or longer than good human content. It's better chunked, better sourced, better standalone. Here's the grammar of this content design.

Mis à jour 14 April 2026 10 min de lecture

Core principle: write for retrieval

AI engines in search mode (ChatGPT Search, Perplexity, AI Overviews) work in two steps: a retrieval stage that pulls relevant passages from a corpus, then a generation stage that synthesises a response while citing those passages. Optimising for retrieval means making each of your paragraphs readable out of context.

Chunking: the granularity that matters

Retrieval systems split documents into chunks of a few hundred to a few thousand characters. Chunk boundaries often follow HTML structure (headings, paragraphs).

HTML componentRole in chunkingBest practice
H2Hard boundaryOne H2 = one distinct intent, with its implicit long-tail query.
H3Secondary boundarySub-question or sub-aspect, never decorative.
ParagraphTypical chunk unit3 to 6 lines. One idea per paragraph.
ListNear-extractable as-isStandalone items, no "see above" references.
TableExtracts very wellClear headers, short cells, avoid merged cells.

Standalone passages: test each one

Simple test: copy any paragraph of your page and paste it into an empty message to a colleague. If the paragraph stays understandable, it's standalone.

Citation-friendly content

A cited passage is one the model can display with confidence. It has three traits:

  1. A sharp claim, "Google AI Overviews rolled out broadly in 2025" is citable. "AI is changing SEO" isn't.
  2. Minimum context, who, what, when. No ambiguity on the subject.
  3. Verifiability, an external source, a published datum, an author.

Entities and disambiguation

LLMs bind your content to entities. If your brand shares its name with something else (a plant, a person, another company), disambiguation is priority one. Techniques:

Anatomy of a GEO page

  1. H1, primary query, 6 to 12 words, no superlatives.
  2. Lede, 2 to 4 sentences that already answer the question. First sentence standalone.
  3. Dates, publication + last update, visible.
  4. H2 "In brief", 3 to 5 bullets, each citable as-is.
  5. Body, 5 to 8 H2 sections covering sub-intents.
  6. Table or checklist, at least one dense, extractable element.
  7. Contextual FAQ, 3 to 6 local (not generic) questions.
  8. Outbound linking, 3 to 6 internal contextual links, 1 to 3 external source links.
  9. Author and organisation, schema.org Article + Organization.

Length, format, density

There's no magic length. A page must cover its subject, not hit a word quota. Benchmarks:

Common mistakes observed

Express checklist

  • Each H2 carries a clear intent and reformulates a query.
  • Each paragraph can be read in isolation.
  • Every numerical claim is dated and sourced.
  • Every acronym is defined at first occurrence.
  • The page contains at least one table or checklist.
  • The page carries a visible update date.
  • Internal linking goes out to at least 3 other pages on the site.
  • schema.org structured data is validated.