Schema.org and LLMs: the complete guide to structuring data for AI

In brief

Schema.org is a standardised vocabulary of structured data, created in 2011 by Google, Microsoft, Yahoo and Yandex. It allows annotating HTML content so that machines - search bots, LLMs, AI agents - understand the nature of what they read, not just the words. An article with an Article schema is immediately identified as a dated publication. A FAQ page with a FAQPage schema exposes its questions and answers in an exploitable way without HTML analysis. This guide covers the most important schemas for visibility in AI answer engines, with ready-to-copy JSON-LD examples.

1. Why schema.org matters for LLMs

LLMs interact with your content at two distinct moments:

Training (GPTBot, ClaudeBot, Applebot-Extended...). Training crawlers collect billions of pages. Schema.org allows them to categorise the document (article, FAQ, organisation, person), detect its date and author, and understand relationships between entities (sameAs, memberOf). This influences what the model "knows" about your brand after training.
Web retrieval (web RAG) (Perplexity, ChatGPT Search). The system crawls the page and extracts passages. The Article schema with dateModified is used to evaluate freshness. The FAQPage schema directly exposes question/answer pairs to retrieval - perfectly structured chunks.

In parallel, schema.org improves performance in Google Search (Featured Snippets, Rich Snippets, AI Overviews) and in Bing, which is the underlying data source for Perplexity and ChatGPT Search. The impact is therefore both direct (LLMs read schema) and indirect (better ranking in the source indexes of web RAG).

2. Priority schemas by page type

2.1 Article / BlogPosting

To use on all editorial content pages (articles, guides, analyses). Essential fields:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Exact article title",
  "description": "150-160 character summary",
  "datePublished": "2026-04-22",
  "dateModified": "2026-04-22",
  "inLanguage": "en",
  "author": {
    "@type": "Organization",
    "name": "Your organisation",
    "url": "https://your-site.com"
  },
  "mainEntityOfPage": "https://your-site.com/article/"
}

Common errors: omitting dateModified (LLMs cannot detect freshness), putting an empty description identical to the title, omitting inLanguage on multilingual sites.

2.2 FAQPage

To use on pages containing a questions/answers section. This schema is the most directly exploited by LLMs: it pre-digests the chunking work by exposing self-contained Q/A pairs.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is RAG?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "RAG (Retrieval Augmented Generation) is a system that allows an LLM to consult external sources before generating its response, to produce cited and up-to-date answers."
      }
    }
  ]
}

Quality rule: each answer (text) must be self-contained (understandable without reading the question) and complete (no "see above" or implicit reference). An answer of fewer than 40 words is often too short to be exploitable.

2.3 Organization

To place on the homepage or About page. This schema is the main vector of entity disambiguation: it links your site to your Wikidata, Wikipedia, LinkedIn, Crunchbase profiles via sameAs. LLMs use these links to build a coherent representation of your organisation.

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Name of your organisation",
  "url": "https://your-site.com",
  "logo": "https://your-site.com/logo.png",
  "description": "Factual description in 1-2 sentences",
  "foundingDate": "2024",
  "sameAs": [
    "https://www.wikidata.org/wiki/Q...",
    "https://www.linkedin.com/company/...",
    "https://en.wikipedia.org/wiki/..."
  ]
}

2.4 BreadcrumbList

To place on all pages except homepage. Strong site structure signal for LLMs, which use it to understand the thematic hierarchy of your content.

{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://your-site.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Insights",
      "item": "https://your-site.com/insights/"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "Schema.org and LLMs"
    }
  ]
}

2.5 HowTo

To use on pages describing a step-by-step procedure. This schema is used by Google AI Overviews for "how" queries. Each HowToStep becomes a self-contained chunk in the RAG pipeline.

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to optimise your site for LLMs",
  "step": [
    {
      "@type": "HowToStep",
      "position": 1,
      "name": "Step 1: Audit robots.txt",
      "text": "Verify that the major AI bots (GPTBot, PerplexityBot, ClaudeBot) are not blocked in your robots.txt."
    },
    {
      "@type": "HowToStep",
      "position": 2,
      "name": "Step 2: Add Article schema",
      "text": "Add a JSON-LD Article block with datePublished and dateModified on each content page."
    }
  ]
}

2.6 WebSite

To place only on the homepage. Allows search engines and LLMs to understand that your site is a coherent entity. The potentialAction field activates the sitelinks search box in Google.

{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Site name",
  "url": "https://your-site.com",
  "inLanguage": "en",
  "description": "Short site description"
}

Note: do not duplicate the WebSite schema across multiple pages - one instance on the homepage is sufficient.

3. Injecting multiple schemas on the same page

The recommended technique in 2026 is to inject a JSON-LD array containing multiple schema objects in a single <script type="application/ld+json"> tag:

<script type="application/ld+json">
[
  {
    "@context": "https://schema.org",
    "@type": "Article",
    "headline": "...",
    "datePublished": "2026-04-22",
    "dateModified": "2026-04-22"
  },
  {
    "@context": "https://schema.org",
    "@type": "FAQPage",
    "mainEntity": [...]
  }
]
</script>

This approach is validated by Google and correctly read by LLMs that parse JSON-LD. Alternatively, two separate script tags can be used.

4. Schema errors that harm AI visibility

4.1 Schema inconsistent with visible content

If your FAQPage schema lists questions absent from the visible HTML, Google and LLMs detect the inconsistency. Rule: schema must reflect what a user would see on the page, not invisible or truncated content.

4.2 Incorrect or missing date data

datePublished: "2026" is not a valid ISO 8601 format. Use "2026-04-22" (YYYY-MM-DD) or "2026-04-22T10:00:00+00:00" (with time and timezone). An invalid format is ignored by parsers.

4.3 Organization.sameAs pointing to broken URLs

LLMs verify (during training or retrieval) that sameAs links resolve to a page that mentions your entity. An empty Wikidata entry or a LinkedIn 404 link weakens the entity signal rather than strengthening it.

4.4 WebSite schema duplicated across multiple pages

A common error in CMS that automatically inject WebSite schema on all pages. Google flags this as a structured data error. Limit WebSite to the homepage.

4.5 Malformed JSON-LD

Invalid JSON (missing comma, unescaped quote, unclosed brace) causes the parser to completely ignore the block. Verify with the Rich Results Test before any deployment.

5. Schema.org and AI surfaces in 2026

Surface	Most impactful schema	Observed impact
Google AI Overviews	Article, FAQPage, HowTo	Strong: AIO reads and cites Q/A pairs from FAQPage
Bing Copilot	Article, Organization	Moderate: improves Bing ranking as RAG source
Perplexity	Article (dateModified)	Moderate: freshness detected, preferred for current queries
ChatGPT Search	Article, FAQPage	Moderate: same logic as Perplexity via Bing
LLMs (training)	Organization + sameAs	Long term: entity disambiguation, brand representation
Google Featured Snippets	FAQPage, HowTo	Strong and immediate

6. Action plan: implementation priorities

Week 1. Add Article schema with datePublished and dateModified on all content pages. Verify with the Rich Results Test.
Week 2. Add FAQPage schema on pages that already contain a FAQ section or Q/A. Re-read each answer to verify self-containment.
Week 3. Add Organization schema on the homepage with sameAs fields completed (Wikidata, LinkedIn, Wikipedia if available). Create the Wikidata entry if it does not exist.
Week 4. Add BreadcrumbList on all pages except homepage. Audit existing schemas to detect WebSite duplicates and incorrect date formats.

Schema.org checklist for LLMs

Article schema with datePublished + dateModified on all content pages
dateModified updated on each substantial revision
FAQPage schema on pages with Q/A sections
Each FAQPage answer self-contained (at least 40 words)
Organization schema with sameAs on the homepage
WebSite schema only on the homepage (not duplicated)
BreadcrumbList schema on all pages except homepage
JSON-LD validated via Rich Results Test before deployment
Schema/visible content consistency verified

FAQ

Is schema.org essential to be cited in LLMs?

Not essential - pages without schema are cited. But schema.org improves the precision with which LLMs interpret your content: page type, publication date, author, structured questions and answers. It reduces ambiguity and increases eligibility for enriched surfaces (Featured Snippets, AI Overviews, Bing Answers).

Which schema is most useful for AI SEO?

FAQPage is the most direct: it explicitly exposes questions and answers that LLMs can extract. Article or BlogPosting with dateModified improves freshness detection. Organization with sameAs creates the brand entity. In order of priority: 1) Article/BlogPosting, 2) FAQPage on suitable pages, 3) Organization on the homepage, 4) BreadcrumbList on all pages.

Can you use multiple schemas on the same page?

Yes, and it is recommended. An article can simultaneously carry an Article schema (publication information) and a FAQPage schema (if the article contains a FAQ section). The technique consists of injecting a JSON-LD array containing multiple objects. Google and LLMs read all of them.

How do you verify that your schema is correctly read?

Three tools: the Google Rich Results Test (search.google.com/test/rich-results) to verify validity and eligibility for rich snippets, Schema.org Validator (validator.schema.org) for standard compliance, and the "Structured Data" section in Google Search Console to monitor production errors.

Does schema serve ChatGPT or Perplexity directly?

No official confirmation, but indirectly: schema improves ranking in Google and Bing, which are the underlying sources for the web RAG systems of ChatGPT Search and Perplexity. A better rank in these indexes increases the probability of being in the retrieval candidate pool. Additionally, training crawlers (GPTBot, ClaudeBot) read and index schema.org to understand the nature and date of content.

Schema.org and LLMs