By AI Blog Editor
May 8, 2026 · 14 min read

Bleeding Llama — three Ollama CVEs in one week, and an inbox no one was reading

On May 5, 2026 Cyera disclosed a critical heap leak in Ollama and the same week Striga and CERT Polska published two Windows auto-updater RCEs the maintainers had ignored for five weeks. One bug is patched. Two are not.

A llama standing among stone ruins at Machu Picchu, looking at the camera. — Llama at Machu Picchu. Image dedicated to the public domain (CC0) via Wikimedia Commons.

On May 5, 2026, two unrelated research teams published Ollama vulnerability writeups within hours of each other. Cyera Research called theirs "Bleeding Llama" — a critical heap-memory leak, CVSS 9.1, fixable by upgrading to v0.17.1. Striga's Bartłomiej Dmitruk published a Windows auto-updater chain that turns Ollama itself into a persistent malware delivery system, and noted, in a sentence Ollama would probably prefer he hadn't, that he had been waiting since late January for the maintainers to read the email.

The bugs are unrelated. The week is not. Ollama — 170,000 GitHub stars, 100 million Docker Hub pulls, the default answer to "how do I run an LLM on my laptop?" — got hit with three CVEs in seven days, one of them handled cleanly, two of them ignored until a national CERT had to step in. The local-LLM stack's pitch is that running models on your own hardware is safer than handing prompts to OpenAI. This week is what the asterisk on that sentence looks like.

What Bleeding Llama actually does

Cyera's writeup is the kind of bug report that reads like a tutorial. Researcher Dor Attias found that Ollama's GGUF model loader trusts the file's own metadata about how big each tensor is, then reads that many bytes off disk and into a pre-allocated heap buffer using Go's unsafe package. The validation step that should reject "this tensor claims to be 10 GB but the file is 100 MB" does not exist. The vulnerable code lives in fs/ggml/gguf.go and server/quantization.go, in a function called WriteTo that hands an unchecked tensor shape to ConvertToF32, which then walks well past the buffer and into whatever heap memory was sitting next to it.

The attack chain is three unauthenticated HTTP calls, confirmed independently by SecurityWeek and runZero:

POST /api/blobs/sha256:<hash> — upload a malicious GGUF blob.
POST /api/create — ask Ollama to quantize it. The over-read happens here. The leaked memory gets baked into the model weights the server is producing.
POST /api/push to a registry the attacker controls — exfiltrate the resulting "model," which is now your prompts, environment variables, and API keys with a tensor header on top.

Cyera's one-liner, which holds up to the technical description: "A malicious actor can craft a GGUF file that declares a far larger tensor size than the actual data provided, forcing Ollama to read well beyond the intended buffer boundary — accessing sensitive data stored on the heap." What ends up in the leaked blob is whatever the process happened to be holding: in their tests, system prompts from other models on the server, message history from concurrent users, and environment variables from the host shell. If you set OPENAI_API_KEY in the same process for a downstream tool, that ships too.

The disclosure timeline is the well-behaved one. Cyera reported the bug on February 2, 2026; Ollama acknowledged it on February 25 and shared a fix; MITRE assigned CVE-2026-7482 on April 28; the public writeup followed on May 5, well after the patched 0.17.1 release. Anyone who upgrades is fine. The 300,000 figure that ran across every secondary outlet is Cyera's own count of internet-exposed Ollama servers, and it is the load-bearing number in the story for one reason: Ollama by default ships with no authentication and a habit of binding to 0.0.0.0. "If your Ollama server was internet-accessible," Cyera wrote, "assume environment variables and secrets in memory may be compromised." That is a sentence that costs a credential rotation.

The two CVEs that did not get the same treatment

Same week, different research lab, different outcome. Striga's Bartłomiej Dmitruk reported two Windows-only flaws in late January 2026: CVE-2026-42248, a signature-verification function that gets called and then does nothing, and CVE-2026-42249, a path traversal in the unsanitised ETag HTTP header that lets the auto-updater plant an executable into %APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup. Chained, they are persistent remote code execution that runs on every login, dressed up as an Ollama update.

Dmitruk's description of CVE-2026-42248 is one of those rare bug reports a non-engineer can read straight: "the Windows build's auto-updater signature verification function exists, it gets called, but it does nothing, and whatever is downloaded gets executed."

That is the entire vulnerability. There is no clever exploit primitive. There is a function with the right name that someone forgot to wire up.

The interesting part is not the bug. The interesting part is what happened after Striga reported it. Per Help Net Security's writeup and CERT Polska's own advisory, the email to Ollama's documented security contact never got a reply. A maintainer's personal address yielded one acknowledgement, then silence. Five weeks later, Striga handed the disclosure off to CERT Polska, which assigned the CVEs and pushed a public warning on April 29, 2026, listing 0.12.10 through 0.17.5 as confirmed vulnerable. Static analysis suggested the bug was still live through 0.22.0. Ollama's v0.23.0, released two days before the disclosure went public, shipped without a fix.

The mitigation in CERT Polska's advisory is not "upgrade." It is "turn off auto-updates and delete the Startup-folder shortcut Ollama writes when you install it." The product feature is the attack surface; the only thing the user can do is amputate it.

The 300,000 servers question

The Bleeding Llama story is about a memory bug. The Striga story is about a project that has outgrown its maintenance posture. They share a precondition: Ollama's default deployment is friendlier than its threat model.

Run the installer on Windows and you get an auto-updater that listens to its own server with a checked-by-no-one signature. Run the Linux container and you get an HTTP API on :11434 with no authentication and a willingness to accept arbitrary GGUF files from anyone who can reach the port. The 300,000 number is not "people who deliberately published their AI to the internet." It is "people who installed a tool, didn't add a reverse proxy, and didn't realise the box was already on the public internet." That is a much larger group, and it is the one OpenAI and Anthropic do not have, because they are the reverse proxy.

This is the local-LLM tax. The pitch — your prompts never leave your hardware, your weights never leave your hardware, you control everything — is real. So is the bill. You inherit the auth story, the patch cadence, the listening-port hygiene, and the "is anyone reading the security inbox" question. Self-hosted is safer than SaaS only for self-hosters who are willing to be their own security team. A meaningful share of the people Ollama has spent two years onboarding will never fit that description and were never told they needed to.

Ollama is not unusual here. It is unusually visible — GGUF is load-bearing for open-weights inference, the Docker numbers are real — but llama.cpp servers, vLLM endpoints, and Text Generation WebUI installations have all shipped with the same "no auth, bind to everything, trust the file format" defaults at one point or another. The frontier-lab APIs charge for the part of the system that is bored, awake, and reading the email.

What to watch

Whether v0.23.x ships a Striga fix. Five weeks of silence followed by a CERT Polska disclosure is the kind of incident that, in a healthier project, ends with an out-of-band patch in the same week. If the next Ollama release notes do not name CVE-2026-42248 and CVE-2026-42249, that is the headline, not the bugs themselves.
Whether anyone changes the default network posture. A reasonable read of this week is that Ollama's defaults are the bug. Binding to localhost by default, requiring an explicit opt-in to expose the API, and shipping an OLLAMA_AUTH_TOKEN would close 90% of the real-world exposure without changing the laptop developer experience. The next minor release is the test of whether the project agrees.
Whether the open-weights stack picks up a coordinated disclosure habit. llama.cpp, vLLM, Text Generation WebUI, and Ollama collectively run more inference than most cloud labs. None of them runs a security response process that looks like a vendor's. CERT Polska had to step in for one project this month. The question for the rest of 2026 is whether they have to keep doing it.

For now, if you run Ollama on a server you care about: upgrade to 0.17.1 or later, put it behind something that asks for a password, and assume any keys you fed it before this week have walked. If you run it on Windows, turn the auto-updater off and check your Startup folder. The project will catch up. The 300,000 servers may not.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop