Ena Pragma

You can't optimize for 'the AI.' There isn't one.

Getting cited inside ChatGPT, Perplexity, and AI Overviews is real but mostly mis-sold. What the controlled studies show works, and what is snake oil.

Every operator we talk to is asking some version of the same question: when my customer asks ChatGPT instead of Googling, how do I show up in the answer?

It is the right question. It also has a wrong premise buried in it. There is no "the AI." There is no single index, no single ranking, no single thing to optimize. The assistant your buyer is typing into is a thin layer over a retrieval system, and the retrieval system is different for every assistant. Get that wrong and you will spend a quarter optimizing for a mechanism that the engine you care about does not even use.

So before any tactic, the load-bearing fact:

It depends on which index the engine grounds on

"Show up in an LLM" is not one problem. Whether your brand can be cited at all depends on which index the engine retrieves from, and whether it ran a live search for that query in the first place.

EngineWhat it grounds onWhat that means for you
ChatGPT (search mode)Bing's index, plus OpenAI's own crawlerBe crawlable and indexed in Bing
ChatGPT (no browsing)Its training data onlyYou are nameable only if your entity was learned before the cutoff
PerplexityIts own crawler and indexAllow PerplexityBot and Perplexity-User. Not Google or Bing based
Google AI OverviewsGoogle's index, via GeminiBe indexed and snippet eligible. No new technical requirement
Microsoft CopilotBing's indexSame play as ChatGPT search
Claude (web search on)Web-search tool results, else training dataDual nature, like ChatGPT

Read that table twice, because it kills most of the advice you have been sold. A tactic that helps you in Perplexity (which crawls the open web itself) may do nothing in a no-browsing ChatGPT answer (which can only name what it learned in training). "Rank in AI" is a category error. You rank, or fail to rank, one engine at a time.

The fact that breaks your SEO instinct

Here is the result that surprises every marketer who came up through search. Chat assistants do not mirror the Google results page.

12%
of URLs cited by AI assistants also rank in Google's top 10 for the same prompt

On average, only about 12 percent of the URLs that assistants cite also rank in Google's top 10 for the original prompt, and roughly 80 percent do not rank anywhere in Google's top 100 (Ahrefs, 15k-prompt study). Perplexity is the closest to classic search at 28.6 percent top-10 overlap; ChatGPT, Gemini, and Copilot sit near 8 percent each. Google's AI Overviews are the exception that proves the rule: about 76 percent of their citations come from pages that already rank in the top 10, because they ride Google's own index.

The reason is mechanical. Assistants do not take your query and rank ten links. They fan a single prompt out into many query variants, retrieve for each, and fuse the results. So what correlates with getting cited is not your exact URL's position for one keyword. It is domain-level authority across a whole cluster of related queries (Semrush). Page-one-for-a-keyword is the old game. Being the source a topic resolves to is the new one.

What actually works, ranked by evidence

Strip away the vendor decks and a short list survives, ordered by how much real evidence stands behind it.

1. Be indexed and authoritative in Google and Bing. Unglamorous, and the single strongest correlate in every large study. Retrieval-augmented generation retrieves top-ranked results and then writes from them. If you are not in the index the engine grounds on, nothing downstream matters.

2. Format content so a model can lift it: hard statistics, direct quotations, cited primary sources. This is the one lever with controlled-experiment evidence behind it. The Princeton and IIT-Delhi "Generative Engine Optimization" study (KDD 2024) tested content strategies head to head and found that adding statistics, quotations, and citations raised a source's visibility in generated answers by up to roughly 40 percent (arXiv:2311.09735).

~40%
relative visibility lift from adding stats, quotes, and citations (Princeton GEO study)

The same study found that keyword stuffing underperformed the baseline. Sit with that. The tactic the snake-oil vendors still sell is the one that measured worse than doing nothing.

3. Get cited by the small set of domains the engines already trust. LLM citations concentrate hard on a handful of sources: Wikipedia, Reddit, YouTube, LinkedIn, and the authoritative sites of your niche. Wikipedia alone is ChatGPT's single most-cited source (Profound, 680M-citation study). The goal is not to outrank that set. It is to get named inside it.

4. Win the entity layer. A clear, consistent presence in the knowledge graph (Wikipedia where you are notable, consistent organization facts, a coherent entity across the web) is what survives the no-browsing case, where the model answers from weights alone and can only name what it already learned. The entity layer is the only lever that reaches the pure-weights answer.

5. Let the AI crawlers in. GPTBot, OAI-SearchBot, PerplexityBot, Perplexity-User, Google-Extended. If your robots rules or your firewall block them, you have opted out of every retrieval-mode answer. Verify in Google Search Console and Bing Webmaster Tools, not by assumption.

The graveyard

Now the part most posts will not tell you, because they are selling the items in it.

  • llms.txt. Covered above. No announced support from any major engine. Ship it if you like; do not pay for it.
  • Schema markup sold as a ranking or citation cause. Structured data is a comprehension aid, worth doing for entity clarity and rich results. It is not a ranking factor, and Google has said so directly. Useful, not magic.
  • Keyword density and "AI keyword optimization." Tested, underperforms baseline. See above.
  • "Authoritative tone" rewrites with no new facts. Among the weakest interventions in the controlled study. Fluency without substance does not move citations.
  • "Guaranteed citation" and paid placement into organic answers. No engine exposes this. The engines actively de-bias against over-cited domains; ChatGPT cut its reliance on Reddit and Wikipedia in late 2025 specifically to reduce manipulation (Semrush). Anyone guaranteeing a citation is guaranteeing something they do not control.

The play, if you are a serious B2B brand

Stop chasing per-page citations. Become the canonical reference on your topic, and the citations follow.

  1. Own the explainer layer. For every concept your buyers ask about, be the most thorough, most current, most quotable source: hard numbers, direct quotes from primary sources, real citations. This is the Princeton-validated mechanism, and it compounds, because the topics that change every year reward whoever keeps the definitive page fresh.
  2. Win the entity layer so both grounded and pure-weights answers resolve your topic to you.
  3. Earn mentions in the domains the engines already trust in your category. Forensically test what each engine cites today for your real buyer questions, then go earn placement in those exact sources.
  4. Be crawlable and indexed in Google and Bing. The prerequisite for everything above.
  5. Instrument it. Track your share of citations per engine, monthly. The patterns are volatile and shift week to week. If you are not measuring per engine, you are guessing.

None of this is a trick played on a model. It is being the most cited, most authoritative, most quotable source on the questions your buyers actually ask. That was always the work. The only thing that changed is where the answer gets rendered.

The brands that win the answer layer will not be the ones with the cleverest llms.txt. They will be the ones a model cannot describe the topic without quoting.


Sources

EP Intel

Get the next intel brief in your inbox.

Field notes from the AI operations integration layer. New briefs land roughly every two weeks.

You can't optimize for 'the AI.' There isn't one. · Ena Pragma