June 2, 2026·7 min read·generative-engine-optimization · ai-search · content-strategy

You can't optimize for 'the AI.' There isn't one.

Getting cited inside ChatGPT, Perplexity, and AI Overviews is real but mostly mis-sold. What the controlled studies show works, and what is snake oil.

Contents

Every operator we talk to is asking some version of the same question: when my customer asks ChatGPT instead of Googling, how do I show up in the answer?

It is the right question. It also has a wrong premise buried in it. There is no "the AI." There is no single index, no single ranking, no single thing to optimize. The assistant your buyer is typing into is a thin layer over a retrieval system, and the retrieval system is different for every assistant. Get that wrong and you will spend a quarter optimizing for a mechanism that the engine you care about does not even use.

So before any tactic, the load-bearing fact:

#It depends on which index the engine grounds on

"Show up in an LLM" is not one problem. Whether your brand can be cited at all depends on which index the engine retrieves from, and whether it ran a live search for that query in the first place.

Engine	What it grounds on	What that means for you
ChatGPT (search mode)	Bing's index, plus OpenAI's own crawler	Be crawlable and indexed in Bing
ChatGPT (no browsing)	Its training data only	You are nameable only if your entity was learned before the cutoff
Perplexity	Its own crawler and index	Allow PerplexityBot and Perplexity-User. Not Google or Bing based
Google AI Overviews	Google's index, via Gemini	Be indexed and snippet eligible. No new technical requirement
Microsoft Copilot	Bing's index	Same play as ChatGPT search
Claude (web search on)	Web-search tool results, else training data	Dual nature, like ChatGPT

Read that table twice, because it kills most of the advice you have been sold. A tactic that helps you in Perplexity (which crawls the open web itself) may do nothing in a no-browsing ChatGPT answer (which can only name what it learned in training). "Rank in AI" is a category error. You rank, or fail to rank, one engine at a time.

#The fact that breaks your SEO instinct

Here is the result that surprises every marketer who came up through search. Chat assistants do not mirror the Google results page.

12%

of URLs cited by AI assistants also rank in Google's top 10 for the same prompt

On average, only about 12 percent of the URLs that assistants cite also rank in Google's top 10 for the original prompt, and roughly 80 percent do not rank anywhere in Google's top 100 (Ahrefs, 15k-prompt study). Perplexity is the closest to classic search at 28.6 percent top-10 overlap; ChatGPT, Gemini, and Copilot sit near 8 percent each. Google's AI Overviews are the exception that proves the rule: about 76 percent of their citations come from pages that already rank in the top 10, because they ride Google's own index.

The reason is mechanical. Assistants do not take your query and rank ten links. They fan a single prompt out into many query variants, retrieve for each, and fuse the results. So what correlates with getting cited is not your exact URL's position for one keyword. It is domain-level authority across a whole cluster of related queries (Semrush). Page-one-for-a-keyword is the old game. Being the source a topic resolves to is the new one.

#What actually works, ranked by evidence

Strip away the vendor decks and a short list survives, ordered by how much real evidence stands behind it.

1. Be indexed and authoritative in Google and Bing. Unglamorous, and the single strongest correlate in every large study. Retrieval-augmented generation retrieves top-ranked results and then writes from them. If you are not in the index the engine grounds on, nothing downstream matters.

2. Format content so a model can lift it: hard statistics, direct quotations, cited primary sources. This is the one lever with controlled-experiment evidence behind it. The Princeton and IIT-Delhi "Generative Engine Optimization" study (KDD 2024) tested content strategies head to head and found that adding statistics, quotations, and citations raised a source's visibility in generated answers by up to roughly 40 percent (arXiv:2311.09735).

~40%

relative visibility lift from adding stats, quotes, and citations (Princeton GEO study)

The same study found that keyword stuffing underperformed the baseline. Sit with that. The tactic the snake-oil vendors still sell is the one that measured worse than doing nothing.

3. Get cited by the small set of domains the engines already trust. LLM citations concentrate hard on a handful of sources: Wikipedia, Reddit, YouTube, LinkedIn, and the authoritative sites of your niche. Wikipedia alone is ChatGPT's single most-cited source (Profound, 680M-citation study). The goal is not to outrank that set. It is to get named inside it.

4. Win the entity layer. A clear, consistent presence in the knowledge graph (Wikipedia where you are notable, consistent organization facts, a coherent entity across the web) is what survives the no-browsing case, where the model answers from weights alone and can only name what it already learned. The entity layer is the only lever that reaches the pure-weights answer.

5. Let the AI crawlers in. GPTBot, OAI-SearchBot, PerplexityBot, Perplexity-User, Google-Extended. If your robots rules or your firewall block them, you have opted out of every retrieval-mode answer. Verify in Google Search Console and Bing Webmaster Tools, not by assumption.

#The graveyard

Now the part most posts will not tell you, because they are selling the items in it.

llms.txt. Covered above. No announced support from any major engine. Ship it if you like; do not pay for it.
Schema markup sold as a ranking or citation cause. Structured data is a comprehension aid, worth doing for entity clarity and rich results. It is not a ranking factor, and Google has said so directly. Useful, not magic.
Keyword density and "AI keyword optimization." Tested, underperforms baseline. See above.
"Authoritative tone" rewrites with no new facts. Among the weakest interventions in the controlled study. Fluency without substance does not move citations.
"Guaranteed citation" and paid placement into organic answers. No engine exposes this. The engines actively de-bias against over-cited domains; ChatGPT cut its reliance on Reddit and Wikipedia in late 2025 specifically to reduce manipulation (Semrush). Anyone guaranteeing a citation is guaranteeing something they do not control.

#The play, if you are a serious B2B brand

Stop chasing per-page citations. Become the canonical reference on your topic, and the citations follow.

Own the explainer layer. For every concept your buyers ask about, be the most thorough, most current, most quotable source: hard numbers, direct quotes from primary sources, real citations. This is the Princeton-validated mechanism, and it compounds, because the topics that change every year reward whoever keeps the definitive page fresh.
Win the entity layer so both grounded and pure-weights answers resolve your topic to you.
Earn mentions in the domains the engines already trust in your category. Forensically test what each engine cites today for your real buyer questions, then go earn placement in those exact sources.
Be crawlable and indexed in Google and Bing. The prerequisite for everything above.
Instrument it. Track your share of citations per engine, monthly. The patterns are volatile and shift week to week. If you are not measuring per engine, you are guessing.

None of this is a trick played on a model. It is being the most cited, most authoritative, most quotable source on the questions your buyers actually ask. That was always the work. The only thing that changed is where the answer gets rendered.

The brands that win the answer layer will not be the ones with the cleverest llms.txt. They will be the ones a model cannot describe the topic without quoting.

#Sources

GEO: Generative Engine Optimization, Aggarwal et al., KDD 2024. arXiv:2311.09735
AI citations vs. Google rankings, 15k-prompt study. Ahrefs
AI Mode citation analysis and most-cited domains. Semrush, Semrush
AI platform citation patterns, 680M citations. Profound
AI features and your website. Google Search Central
Perplexity crawlers. Perplexity docs

You can't optimize for 'the AI.' There isn't one.

#It depends on which index the engine grounds on

#The fact that breaks your SEO instinct

#What actually works, ranked by evidence

#The graveyard

#The play, if you are a serious B2B brand

#Sources

Agents Did Not Clear the Backlog. They Moved the Bottleneck.

An agent that watches your agent: a drift tripwire, not a security guard

The false finish: agents don't just fail, they stop early and call it done

#It depends on which index the engine grounds on

#The fact that breaks your SEO instinct

#What actually works, ranked by evidence

#The graveyard

#The play, if you are a serious B2B brand

#Sources

Keep reading

Agents Did Not Clear the Backlog. They Moved the Bottleneck.

An agent that watches your agent: a drift tripwire, not a security guard

The false finish: agents don't just fail, they stop early and call it done