·8 min read·skill-docs-and-getting-cited-by-ai

What Anthropic, OpenAI, and Google's Skill Docs Reveal About Getting Cited by AI

The three labs document their AI skills almost identically. The surprising part is what that shared playbook does, and does not, do for getting cited by AI.

Anthropic, OpenAI, and Google now document their AI "skills" in almost the same way: one open file format, a Markdown mirror of every page, and an index built for machines. It is a clean playbook. It is also built for AI agents to read the docs, not for AI answer engines to cite them. Those are two different games, and most teams are only playing one.

We read the rendered HTML of all three vendors' skill documentation and cross-checked it against the public research on what AI engines actually cite. Here is what the comparison shows, and what it means if your goal is to become a source that AI quotes.

#The shared playbook

All three put skills on the same open standard: a folder with a SKILL.md manifest that carries a name, a description, and a body that loads progressively, so the model sees a one-line summary first and the full instructions only when a task matches. Around that, all three wrap the docs in an identical machine-ingestion layer.

SignalClaudeOpenAIGemini CLI
SKILL.md open standardYesYesYes
llms.txt indexYesYesYes
Per-page Markdown twinYesYesYes
JSON-LD structured dataNoneNoneNone
In-docs AI chatYes (Inkeep)Yes (custom)No

The convergence is the story. Three competitors independently landed on the same format, the same Markdown-for-machines plumbing, and the same decision to ship no structured data on their docs pages. That last one is worth sitting with, because the conventional SEO advice says the opposite.

#The surprising part: llms.txt does not get you cited

The llms.txt file is the centerpiece of the shared playbook, a Markdown index that tells AI tools where everything is. The evidence that it drives AI citations is, at best, absent.

10%
of sites have adopted llms.txt; in a 300,000-domain study, removing it from the citation model improved accuracy rather than hurting it (SE Ranking)

A separate analysis of more than 515 million AI-bot requests found the answer-engine crawlers almost never fetch the file; they read the HTML directly. Google has said on the record that its AI search relies on the same signals as the rest of Search, and its guidance never mentions llms.txt.

So why do all three labs ship it? Because it works for a different reader. Coding agents like Cursor, Claude Code, and Copilot do fetch llms.txt and the per-page Markdown when you point them at a docs site. The file is real infrastructure for agents reading your docs. It is just not a lever for answer engines citing them. If you publish it expecting ChatGPT to quote you more, you are optimizing the wrong reader.

#What actually earns a citation

Three things, in order of weight.

First, earned authority. When researchers traced over a million AI citations, the overwhelming majority pointed at third-party editorial sources, not the brand's own blog.

Where AI citations point (share of cited links)
Earned / third-party editorial~89%
Owned, paid, and other~11%

From an analysis of over 1 million links cited by AI tools: 95% came from non-paid sources, of which 89% were earned media. The exact figure varies by report and engine; the direction is consistent across them.

Source: Muck Rack, What Is AI Reading? (2025)

Second, this is a separate game from SEO. One analysis of roughly 40,000 queries found that 88% of Google's AI Mode citations were not in the organic top ten results. Ranking well does not mean getting cited; they are nearly independent systems.

Third, the content itself, with one important correction to the popular advice. The famous result here is the Generative Engine Optimization paper (Aggarwal et al., 2024), which reported that adding statistics, citations, and quotations lifted visibility by up to 40%. That number is everywhere in GEO advice. The problem is that the study measured a custom research engine, not a production platform.

When a later analysis replicated it across 3,205 pages on four live engines (ChatGPT, Claude, Perplexity, Google AI Mode), only one of the three levers held up. Statistics survived, and strongly: pages with higher numeric density were significantly more likely to be cited, from about 21% more on Google AI Mode to 121% more on Claude. The other two inverted. Pages with more citations and quotations were less likely to be cited, not more.

#The engines do not agree

There is no single "AI" to optimize for. A 1,056-datapoint analysis of where different engines pull citations found sharply different habits: ChatGPT leans encyclopedic and cites Wikipedia heavily, while Perplexity and Google lean on video. Claude was the outlier for technical work. In the window studied, it cited brand domains and primary or institutional sources, and effectively no YouTube, Wikipedia, or Reddit.

The takeaway for anyone publishing technical or specialized content: Claude is the engine most likely to cite a well-built primary source, because that is nearly all it cites. If your material is formal and first-hand, you are writing for the reader most inclined to quote you.

#Write so a machine can lift it

AI answer engines do not read your page; they retrieve a passage from it. Their pipeline embeds your content, searches for the chunk that best matches a query, and quotes that chunk with attribution. The unit of citation is the section, not the article.

That changes how you structure a page:

  • Keep each answer self-contained in two to four sentences, under roughly 300 words, so it fits inside a single retrieved chunk.
  • Front-load the core answer in the first 150 words of the page; that opening window is the highest-value real estate for retrieval.
  • Use tables and lists for comparisons and specs. They are clean extraction targets a model can lift without paraphrasing.
  • Make sure the content you want cited exists in the server-rendered HTML. Many engines do not run JavaScript when they retrieve, so anything injected by the browser is invisible to them.

#If you are publishing in 2026

The labs' docs are a useful mirror. They are excellent at machine-readability and weak at exactly the spots where an independent publisher can win:

  • Lead with the answer, and back it with specific numbers. Numeric density is the one content lever that replicates across live engines; vague prose is not citable.
  • Keep your entity identity consistent. A clear Organization and author identity, with structured data that links to your real public profiles, is the one piece of schema worth shipping. It helps engines resolve who you are. The rest of the schema stack is a last-mile optimizer; LLMs tokenize it but do not parse it.
  • Beat the giants on freshness. All three labs mostly skip machine-readable last-modified dates. Freshness is one of the signals most associated with citation, and it is nearly free to maintain.
  • Earn mentions off your own domain. Your blog alone hits a ceiling. Third-party references are what break it.

The shared playbook makes your docs legible to agents. Getting cited by answer engines is a different discipline, built on authority, clear identity, and content a machine can quote cleanly. Run both on purpose.

#FAQ

#Does llms.txt help my content get cited by AI?

Not on current evidence. Large-scale studies show answer-engine crawlers rarely fetch it, and Google has said it does not use it. It is genuinely useful for coding agents reading your docs, which is a real but separate benefit.

#Is structured data worth adding for AI citations?

A little, and selectively. Organization and author schema help engines resolve your identity, which matters. Beyond that, structured data is a minor optimizer; research shows LLMs tokenize the markup as text rather than parsing it as schema, so visible on-page structure does more work.

#Which AI engine is most likely to cite a technical blog?

Claude, based on a 1,056-datapoint analysis of citation behavior. It draws heavily on brand and primary or institutional sources and largely avoids user-generated platforms, which favors formal, first-hand technical content.

#Do adding citations and quotations help my content get cited?

Counterintuitively, no, based on a 3,205-page replication across four live engines. Numeric density helped, but more citations and quotations correlated with being cited less, not more. The widely repeated "+40% from adding citations" figure came from a custom research engine, not a production one. Cite your sources for honesty and reader trust, which matters, but do not expect attribution density itself to win citations. Specific numbers are the lever that holds up.

If you want help making your own content and systems legible to AI, that is the work we do at EP. See how we approach it.