Back to blog

Your AEO Win Might Just Be ChatGPT Growing Up

A 2026 log-based natural experiment shows most of the lift in a headline AEO case study came from platform growth, not the optimization. Why marketers should control for platform tailwind before trusting any AEO success multiple.

Your AEO Win Might Just Be ChatGPT Growing Up

Your AEO Win Might Just Be ChatGPT Growing Up

A new natural experiment shows most of the lift in a famous case study had nothing to do with the optimization.

Here is an uncomfortable possibility. That impressive AEO case study you bookmarked, the one where ChatGPT referrals jumped 5x after some tweaks, probably measured the wrong thing.

A 2026 paper by Watanabe and Nakayashiki ran the numbers on a single high-traffic website and found that most of the "win" was the platform rising, not the optimization working. AEO, short for Answer Engine Optimization, is the practice of tuning your content so AI assistants like ChatGPT cite and recommend you. The study is worth reading in full (arXiv:2606.04362), but the short version is a useful gut check for anyone selling or buying AEO services.

What they actually did

Most AEO claims rely on third-party traffic estimators. This study did not. The authors used first-party analytics and raw server logs from one real domain, so the referral counts are measured, not modeled.

The setup is what makes it interesting. In January 2026 the team applied AEO changes to one section of the site and left the rest untouched. That untreated remainder became the control group, same domain, same brand, same time window, just no intervention.

This matters because of something the paper calls platform tailwind, the background growth in AI referral traffic that happens to everyone whether they optimize or not. If ChatGPT is sending more clicks to the whole web, your traffic goes up even if you do nothing. A proper test has to separate that tide from your own rowing.

The headline number is mostly tide

Here is the part that should make you skeptical of round-number case studies.

Total ChatGPT referrals to the studied domain grew 5.7x over the window. That is the kind of figure that ends up in a LinkedIn post. But the control pages, the ones that got no AEO work at all, grew 3.5x in the very same period.

So a large share of the "5.7x from AEO" story was happening to pages nobody touched. If you only looked at the treated pages and ignored the control, you would credit the platform's growth to your own intervention. That is the core mistake the paper is built to expose.

So what did the intervention actually do?

To isolate the real effect, the authors used an interrupted time-series model. In plain terms, that means fitting the traffic trend from before the change, projecting it forward, and measuring how far the treated pages broke from that expected path after January.

The estimated true effect of the AEO work was about 1.82x, with a 95% confidence interval of 1.31 to 2.54. A confidence interval is the range the real number probably sits in. So the honest read is "somewhere between a 1.3x and 2.5x lift," not a clean 1.82.

That is still a positive result. It is just a far more modest one than 5.7x, and the gap between those two figures is the whole point.

Why the authors won't call it proven

The paper is careful, and you should be too. The authors ran a placebo-in-time test, which means pretending the intervention happened on a date when nothing actually changed, then checking whether the model "finds" a fake effect anyway. If it does, your method is too twitchy to trust.

That test returned p=0.16. Without getting lost in statistics, that value is not low enough to rule out chance. The pre-intervention period was short and noisy, which makes any before-and-after comparison shakier.

In the paper's own framing, the effect is suggestive, not conclusive. Treat it as a promising signal from one site, not a law of nature.

One piece of good news

There is a quiet win buried in the results. A common fear with AEO is that restructuring content for AI answers will hurt your traditional Google rankings.

That did not happen here. Google organic clicks to the treated pages held steady, and the pages stayed indexed. Optimizing for the answer engine did not cost them the search engine, at least in this case.

What this means for you

  • Always run a control. Compare your optimized pages against untouched pages on the same site over the same window. Without that baseline, you cannot tell your work apart from the platform's growth.
  • Discount raw multiples. When a case study brags about 5x, ask what the unoptimized pages did. If they grew 3x on their own, the real story is much smaller.
  • Measure with first-party data. Server logs and your own analytics beat third-party estimators, which model traffic rather than count it.
  • Expect modest, uncertain lifts. A genuine effect in the 1.3x to 2.5x range is plausible and worth pursuing. A clean, confident, huge number usually means the measurement was sloppy.
  • You probably won't tank your SEO. Optimizing for AI answers did not reduce Google clicks or indexation in this study, so the "AEO will break my search traffic" fear is not automatic.

The takeaway

None of this says AEO is worthless. It says the way we measure AEO is often broken, and broken measurement inflates the wins.

The discipline the paper models, a control group, a real baseline, and honesty about uncertainty, is the difference between marketing you can act on and marketing you just retweet. Before you trust any AEO success story, including your own, ask one question first: compared to what?


Source: Watanabe & Nakayashiki (2026), "Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic," arXiv:2606.04362.

X LinkedIn

Put these strategies into practice

Launch your free GEO audit and discover your visibility in AI answers.