By AI Blog Editor
Jun 15, 2026 · 14 min read

Selling agentic AI with agentic AI — KPMG pulled its flagship report on June 12 after GPTZero found 40 of 45 citations were fake, and four named customers said the case studies never happened

KPMG withdrew "Redefining excellence in the age of agentic AI" on June 12. GPTZero found only 5 of 45 citations matched a real source. UBS, Swiss Federal Railways, TfL, and NHS Greater Manchester each said the case studies KPMG attributed to them were not real.

The KPMG global headquarters at 1 Laan van Langerhuize, Amstelveen, in the Netherlands — a large modern corporate complex of stepped glass-and-steel blocks set behind a long reflecting pond and a flagpole flying the KPMG blue ensign. The building has been KPMG International's head office for the global Big Four accountancy and consultancy network since the 2000s. On June 12, 2026 the firm withdrew its flagship report "Redefining excellence in the age of agentic AI" after an audit by detection company GPTZero found that only five of the report's forty-five citations correctly matched their sources. — KPMG global headquarters, Amstelveen, the Netherlands. Photograph by DennisM, 19 March 2011. CC0 / public domain via Wikimedia Commons.

On Friday, June 12, 2026, KPMG quietly removed a flagship report from its website. The report was called Total Experience: Redefining excellence in the age of agentic AI. It had been published in October 2025 as a sales document — Big Four research aimed at the chief operating officers, chief information officers, and procurement leads who decide whether their company is going to spend the next three years rebuilding its workflows around AI agents. The reason for the takedown, according to the firm's statement, was that KPMG was "reviewing the circumstances surrounding its publication." The reason in plain language is that a detection company called GPTZero went through the report's footnotes one by one and found that the report selling clients on AI had been written, in significant part, by AI — and the AI had made up most of the receipts.

The audit numbers are the story. Of the 45 citations in the report, only five accurately pointed to real sources. The other 40 break down into two buckets: 28 paraphrased a real title, added a fake co-author or invented URL, or otherwise mangled the source past the point of usability; the remaining 12 were vague enough that GPTZero could not confirm any underlying paper existed at all. Four of the case studies the report leaned on — at UBS, Swiss Federal Railways, Transport for London, and NHS Greater Manchester — were disputed in writing by the named organisations once the Financial Times called to check.

This is a sentence that would have been hard to write two years ago. A Big Four firm published a report selling artificial intelligence to enterprise clients, the report's footnotes were generated by artificial intelligence, the artificial intelligence got them mostly wrong, and the people quoted in the report had to be informed by a newspaper that they were in it.

What GPTZero actually found

GPTZero is the detection company founded by Edward Tian in 2022, originally pitched at school teachers worried about ChatGPT essays. The product line has since moved up-market into research-integrity audits — the same outfit ran a 2025 analysis of a US Presidential Commission report that found similar patterns. For the KPMG paper, Tian's team published their methodology alongside the numbers. They took each footnote, searched the databases the footnote claimed to live in, and graded the match. Five citations passed cleanly. Twenty-eight came back as what Tian's team is calling vibe citing — a reference that looks plausible at first read, with a real author name, a real journal, sometimes a real DOI prefix, and a title that is almost but not quite a thing the cited author wrote. Twelve came back as unfalsifiable: phrased so generically that no specific paper could be identified at all.

Tian, on the record about the broader implication: "Error-riddled papers published by the Big Four could 'poison the well of information' and could lead to second-hand AI hallucinations." That is the bit that should land for anyone who has watched a corporate research report get picked up by the wire services, then by trade press, then by the training set of the next model. A bogus citation in a KPMG PDF is no longer a footnote nobody reads. It is a fact in waiting.

What the four named organisations actually said

The denials are the most quotable part of the wreckage. Each of the four came in a different register, which says something about how the case studies were probably composed.

UBS told the Financial Times the claim that the bank had integrated AI agents across investment advisory, risk management, and compliance monitoring via a co-developed Microsoft platform was "factually incorrect." That is the bank-lawyer phrasing for: this did not happen. The pickup in SWI swissinfo.ch is the cleanest second source on the wording.

Swiss Federal Railways said the description of AI agents optimising passenger journeys across its network was "not accurate." The same Swiss-press pickup carries the line. SBB is the largest rail operator in the country the report's author firm is partly named after; whatever the case study described, the operator was not aware of it.

Transport for London said the claims around AI congestion management were "misleading."

NHS Greater Manchester said the assertions "did not align with the press release the footnotes indicated as their source." That last one is the giveaway. It is precisely the kind of sentence a press office writes when it has gone and read its own past press releases and confirmed that the sentences the report attributes to them are, in the most literal possible sense, not in their own files.

Run those four together and you get a clear picture of the failure mode. The report's authors (or rather the model standing in for them) picked four marquee logos in regulated industries, generated plausible-sounding case studies, and footnoted each one to a press release that on quick skim looked like it might support the claim. Nobody at any stage of the editorial chain clicked the links.

The Big Four hallucination pattern

KPMG is the fourth professional-services firm in the last twelve months to get caught publishing AI-fabricated content. EY had a study withdrawn under similar circumstances earlier in 2026; Deloitte, in a 2024 incident now widely cited by the trade press, refunded part of a contract to a government client after AI-fabricated content was found in a deliverable. The pattern is now consistent enough that one can describe it as a workflow rather than a series of accidents. A junior or mid-level consultant uses a frontier chatbot to summarise material, draft a section, or pull together case studies. The chatbot fabricates. The footnotes look right. The footnotes never get checked. The report goes out under the firm's letterhead at premium consultancy rates.

The recursive embarrassment is specific to the agentic-AI sell. The whole point of the KPMG report was to convince clients that agentic AI is reliable enough to deploy across compliance, claims, advisory, and operations. The report's footnotes are a live test of exactly that capability under exactly the conditions the report recommends — drafting at speed, with light review, by a generalist user. The test result is: only one footnote in nine survives cursory verification.

How this slots into the AI-slop story the Loop keeps writing

This is the second time in three weeks the Loop has covered an AI-hallucinated-citations story big enough to make the wires. On May 27 the topic was one fabricated citation per 277 PubMed papers in early 2026 — a Columbia/Lancet audit that found AI-fabricated references concentrated in the review articles that shape clinical guidelines. The mechanism in both stories is identical: a model generates plausible-sounding sources at speed, no human checks them, and the artefact gets shipped into a venue with downstream reach. The difference is the venue. The medical-literature story is about scientific record integrity. The KPMG story is about the corporate-research feed that flows into procurement decisions worth tens of millions of pounds, dollars, francs, and euros.

If you connect the two pieces, the takeaway is uncomfortable. The fabricated-citation problem is not a fringe failure mode at the long tail of the model output. It is showing up in two very different professional contexts at once, both of which are supposed to be among the most defended against this kind of failure. Peer review is supposed to catch it. Big Four editorial review is supposed to catch it. Neither did.

KPMG's spokesperson, on the record: "The firm expects all staff to follow guidelines on responsible AI use, including human oversight to validate content and verify independent sources." That is the right sentence to put in the press release. It is the sentence the report's review chain was supposed to enforce in October and failed to enforce until June. A guideline is not the same as a tripwire.

What this means

Three takeaways.

Vibe citing is now a named failure mode with a number attached. Five of forty-five. Twenty-eight paraphrased into uselessness. Twelve too vague to verify. That is a falsification rate of just under 89% on what is supposed to be the load-bearing evidence in a flagship Big Four research product. The phrase will travel, and every CIO who has ever sent a McKinsey deck back for fact-checking now has a precedent to cite when they ask their next consulting vendor exactly how the bibliography got compiled.
The recursive embarrassment matters for sales. KPMG's competitive pitch on agentic AI is the same as every other Big Four firm's: that the model is now reliable enough, in 2026, to be embedded in compliance and operations workflows with light human review. The report meant to make that case is itself a counter-example to its own thesis. It is hard to imagine a more on-brand failure: a document selling agentic AI to enterprises that demonstrates the exact problem with deploying agentic AI to enterprises. Expect every competitor's salesperson to be holding a printout of the GPTZero analysis by Monday morning.
The detection layer is now a beat. GPTZero is the second source on its second high-profile fabrication audit inside a year. Retraction Watch — the Loop already added them as a source in late May — is running the same kind of forensic work in scientific publishing. The pattern across both is that the people exposing fabricated citations now have product-grade methodologies and named CEOs talking to wire reporters. That is what a beat looks like when it grows up. Expect more of these, faster, and expect the firm being audited to lose more time and more contract value each round.

KPMG's report was 24 pages, written to look like research, and footnoted to look like evidence. On June 12 it stopped existing on the public web. The reasons it existed in the first place — the deadline pressure, the staffing model, the assumption that the chatbot has read what it says it has read — are still in place at every other firm publishing this week.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop