By AI Blog Editor
May 27, 2026 · 15 min read

One in 277 — AI-hallucinated citations have reached the medical literature, and 98% are still sitting there

A Lancet audit of 2.5 million biomedical papers found AI-fabricated citations up twelvefold since 2023 — and concentrated in the review articles that set clinical guidelines. 98.4% have had no publisher action.

The U.S. National Library of Medicine building on the NIH campus in Bethesda, Maryland. — The U.S. National Library of Medicine, which runs PubMed and PubMed Central — the database the audit scanned. Photo Chris Spielmann, public domain via Wikimedia Commons.

On May 7, 2026, a team led by Columbia University nurse and health-AI researcher Maxim Topaz published a single page of correspondence in The Lancet with a number in it that should bother anyone who has ever taken a drug because their doctor read about it in a journal. In the first seven weeks of 2026, one in every 277 papers indexed in PubMed cited at least one study that does not exist.

Three weeks later the finding is still moving — Fortune ran it on May 24, The Decoder on May 26 — and not because the headline number grew. It is because of where the fakes are landing.

The audit, and the curve that points the wrong way

Topaz's team scanned roughly 2.47 million biomedical papers published between January 2023 and February 18, 2026, and checked 97.1 million references against the databases those references claimed to live in. They found 4,046 citations to papers that were never written, spread across 2,810 articles, per Columbia's writeup of the study and Retraction Watch.

In absolute terms that is almost nothing — 4,046 phantom references out of 97.1 million is a rounding error you would round away. If the story stopped there it would not be a story.

It does not stop there, because the rate is on a curve. In 2023, roughly one paper in 2,828 carried a fabricated citation. By 2025 it was one in 458. In the first seven weeks of 2026 it was one in 277 — about 56.9 fabricated references per 10,000 papers, up from around four per 10,000 in 2023. That is a twelvefold increase in two years, and the sharpest jump tracks mid-2024, when AI writing assistants went mainstream. The researchers call their own count a conservative underestimate, which is the polite way of saying the real number is worse.

The fakes cluster exactly where you would least want them

Here is the part that turns a curiosity into a problem. Fabricated references are not spread evenly across the literature. Review articles — the papers that summarise a field and synthesise its evidence — showed a 57% higher fabrication rate than other paper types, per The Decoder's read of the study and Medical Dialogues.

Reviews are not just any papers. They are the input to clinical guidelines. A systematic review aggregates the trials; a guideline cites the review; a doctor or nurse follows the guideline. Put a phantom study into a review and you have quietly poisoned the top of the chain that ends at a patient's bedside. Topaz put it plainly to Columbia: "A medical professional or clinical guideline developer has no way of knowing that the evidence they are relying on does not exist."

The History of Medicine Reading Room at the U.S. National Library of Medicine, lined with shelves of bound medical volumes.

The researcher who almost did it himself

The most useful detail in the whole episode is how Topaz got interested. He nearly published a fabricated citation in one of his own editorials, caught it, and was rattled enough to go count how often it was happening to everyone else. "I was deeply embarrassed: I checked for that, and it still almost happened to me," he told STAT.

If the AI-hallucination researcher cannot keep hallucinated citations out of his own footnotes, the honour system that currently governs reference-checking is not going to hold. That is the case for the whole study in one anecdote.

What the journals did about it: mostly nothing

The follow-up number is the one that should embarrass the publishers rather than the authors. As of the audit, 98.4% of the papers with fabricated references had seen no publisher action — no correction, no flag, no retraction. The default institutional response to a citation of a study that was never run is, overwhelmingly, to leave it where it is.

Asked what they do about it, the big journals gave the answers you would expect. Science, the NEJM, and JAMA told STAT they run citation-validation tooling and lean on author-accountability agreements. PLOS was more honest about the state of the art: it has seen "numerous" unverifiable references in submissions, is piloting fixes, and is tripping over false positives while it does. Renee Hoch, who runs publication ethics at PLOS, said the group is "looking to incorporate this into our publishing workflows and have been exploring offerings in this space." The people closest to the problem will tell you nobody has solved it yet.

The fake reference is the symptom, not the disease

The sharper criticism in the reporting is not about hallucinations at all. It is about reading. Mohammad Hosseini, a research-ethics professor at Northwestern, told STAT that "engagement with the literature is becoming increasingly more superficial" as generative AI takes over the drafting. A fabricated citation only survives to publication if nobody — author, co-authors, reviewers, editors — clicked through to check it. The phantom reference is the visible tip of a habit: citing things you have not read, on the assumption the machine read them for you.

Misha Teplitskiy, a sociologist of science at Michigan, framed the question the way it deserves to be framed: "Is AI making science more efficient, helping us do better work, or even the same work, but faster, or is it just creating slop?" This audit is the first hard data point in what has mostly been an argument about vibes.

The reading room of the old Army Medical Library, a predecessor of today's National Library of Medicine, in an early-20th-century photograph.

arXiv moved first, and made the penalty hurt

The institution that moved first was not a medical publisher. In mid-May, arXiv — the preprint server that physics, math, and computer-science research runs on — announced a one-year ban for authors who submit papers containing incontrovertible evidence of unchecked AI output: hallucinated references, or the tell-tale leftover chatbot boilerplate like "would you like to make any changes?" still sitting in the manuscript. Thomas Dietterich, who chairs arXiv's computer-science section, said the standard is negligence so obvious that "we can't trust anything in the paper," per TechCrunch and The Next Web. Using AI to draft is still fine; shipping the unread output is what gets you banned.

The punishment has a sting most coverage skipped. After the year is up, a banned author's next submission has to be accepted by a peer-reviewed venue before arXiv will take it. The open preprint server's penalty for AI slop is to send you back through the gatekeepers arXiv was built to route around.

What to watch

Review articles are the leading indicator. The 57%-higher fabrication rate in reviews is the number that matters, because reviews feed guidelines. If anyone runs a targeted audit, that is where to point it — and a clean answer there is worth more than the headline one-in-277.
Detection is cheap now; correction is not. Topaz's team automated the hard part — scanning millions of papers for citations that point nowhere — across the whole corpus. The 98.4%-untouched figure says the bottleneck is no longer technical. It is institutional will. Watch whether a single major publisher issues retroactive corrections at scale, or whether they all keep gesturing at "tooling."
The realistic fix is plumbing, not abstinence. Topaz's own prescription, to Fortune: "The problem is unverified AI output entering the permanent record. The fix is not to stop using tools, it's to build verification into the workflow." Reference-checking that runs automatically at submission is dull, unglamorous, and the only thing on the table that scales with the problem.

The audit checked 97.1 million references and found the fakes in less than a rounding error's worth of papers. The reason that is still worth a thousand words is that medicine is the one field where a citation to a study nobody ever ran can end up in the instructions a nurse follows at three in the morning — and right now, 98.4% of the time, nobody has gone back to take it out.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop