By AI Blog Editor
May 4, 2026 · 13 min read

Matched and shipping — AISI's hardest cyber range now has two solvers, only one is in the lab

AISI just confirmed GPT-5.5 is the second model to complete its 32-step corporate network attack range end-to-end. The first was Claude Mythos, which Anthropic still hasn't shipped. AISI also found a universal jailbreak in six hours.

The National Cyber Security Centre building in London, a UK government cyber-security site. — The National Cyber Security Centre in Victoria, London. UK government photo, Open Government Licence v3.0, via Wikimedia Commons.

On April 30, 2026, the UK's AI Security Institute published its evaluation of an early checkpoint of OpenAI's GPT-5.5 against its cyber bench. The headline result: GPT-5.5 is the second model ever to complete AISI's 32-step corporate network attack simulation end-to-end, and on the report's hardest tier of capture-the-flag tasks, it edges out Claude Mythos Preview. The same day, OpenAI began rolling out GPT-5.5-Cyber — a permissive, gated variant of the same model — to its Trusted Access partners.

The Loop has covered both halves of this picture separately. Mythos was the model Anthropic refused to ship, citing capabilities measured against this exact bench. Trusted Access was OpenAI's gated-distribution answer for a model with similar properties. AISI's report is the first independent confirmation that the two are now in the same league. It also documents a universal jailbreak that took expert red-teamers six hours to find.

What AISI actually tested

AISI's bench is two parts. The first is a suite of 95 narrow capture-the-flag tasks across four difficulty tiers — reverse engineering, web exploitation, cryptography, vulnerability research. The second is what AISI calls a "cyber range": multi-step simulations on a network that approximates a real corporate environment.

The hardest range is called The Last Ones. AISI describes it as a 32-step engagement across four subnets and roughly 20 hosts, estimated to take a human expert around 20 hours to complete end-to-end. Until April 13 of this year, no model had ever finished it. Mythos Preview was the first, in 3 of 10 attempts. The fact that no model could do this was a load-bearing data point in Anthropic's case for keeping Mythos out of the API.

That data point now has a footnote.

Two of ten attempts

GPT-5.5 finished The Last Ones end-to-end in 2 of 10 attempts. AISI's wording is bare: "GPT-5.5 completed TLO end-to-end in 2 of 10 attempts, making it the second model to do so." On the expert-tier capture-the-flag tasks, GPT-5.5 averaged 71.4% (±8.0%), against 68.6% (±8.7%) for Mythos Preview, 52.4% for the prior GPT-5.4, and 48.6% for Opus 4.7.

The error bars on those two leading numbers overlap, so "matches" is the honest word; "beats" is what a press release would say. AISI does not say beats.

The most quotable result in the whole report is a single CTF challenge called rust_vm. AISI describes it as a custom virtual machine embedded in a Rust binary, with hidden password-validation logic. The model wrote a disassembler, recovered the logic, and solved the challenge in roughly 10 minutes and 22 seconds at a cost of $1.73 in API usage. AISI's reference timing for a human expert with Binary Ninja, gdb, Python, and Z3 was about 12 hours.

That is a sentence that costs $1.73.

The bench is not all wins. GPT-5.5 failed AISI's industrial-control-systems range — the Cooling Tower scenario — and AISI's note is the kind of detail that will travel: the model got stuck on the IT-side reconnaissance phases before it ever reached the OT-specific attack. Frontier-model meets corporate IT and loses.

What AISI is willing to say out loud

The report's framing is the most consequential part of it, and it is not the part the headlines picked up.

When Mythos came out of the lab, the public narrative — including Anthropic's own framing — was that this was a step-change capability, the kind of thing that justified a new class of restraint. AISI's GPT-5.5 conclusion drops that framing without quite saying so:

"GPT-5.5 shows that rapid improvement on cyber tasks may be part of a more general trend. If cyber-offensive skill is emerging as a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding, we should expect further increases in cyber capability from models in the near future, potentially in quick succession."

In other words: there is no Mythos-shaped exception. The capability AISI flagged in April is now the floor, and the floor was reached by a different lab in three weeks. The "withhold the dangerous one" lever Anthropic pulled — and which OpenAI mostly mirrored with Trusted Access — is a lever that a third or fourth lab can pull next, on a model the first two never controlled. That is a different game.

The six-hour jailbreak nobody is talking about

Here is the paragraph from the AISI report that should be the headline and isn't:

"We identified a universal jailbreak that elicited violative content across all malicious cyber queries OpenAI provided, including in multi-turn agentic settings."

A universal jailbreak across every malicious cyber query the vendor itself supplied. Six hours of expert red-teaming. AISI says OpenAI then "made several updates to the safeguard stack, though a configuration issue in the version provided meant UK AISI were unable to verify the effectiveness of the final configuration."

Read that twice. The patched version had a deployment bug. The institute could not confirm the patch worked. The model then went out the door to Trusted Access partners on the same day this report was published.

The Trusted Access program is real and the gating is real — KYC, vetted partners, banks and security vendors only. But the safety case for that gating leaned on the assumption that the safeguard stack inside the model holds. AISI's footnote is that they could not check, and the model shipped anyway.

This is the part of the story the security press has not surfaced. The benchmark numbers are easier copy; the deployment bug is buried at the bottom of a report that explains its own caveat in 28 words.

What this changes

Three things shift this week.

The first is that "the model that's too dangerous to ship" stops being a category. Anthropic kept Mythos in the lab on capability grounds. Three weeks later AISI matched the capability on a model OpenAI is rolling out to its Trusted Access list. The narrative that Mythos was uniquely dangerous holds only if you don't read the next month's reports. Anthropic's restraint may yet be vindicated by a different metric — agency, autonomy, the multi-step social engineering Mythos was reportedly stronger on — but the AISI bench, the one Anthropic itself cited, no longer separates the two models.

The second is that the AISI-as-trip-wire model has a bug. Both labs gated their cyber models against the same AISI threshold, but the threshold is a moving signal — passed in April, passed again in May, on track to be passed again. If "do not ship a model that completes the AISI cyber range" is the gating rule, then by the next checkpoint everybody will be shipping under exception. The rule needs an exit clause and nobody has written it.

The third is the jailbreak. A six-hour universal jailbreak in an early checkpoint, on a system that subsequently shipped with an unverifiable patch, is the kind of detail that ends up in a parliamentary committee transcript six months later. The labs that depend on a clean trusted-access narrative — we gate access, we verify partners, we tested the safety stack — have one less verifiable claim than they did on April 29.

What to watch

Whether AISI re-tests the production GPT-5.5 with the patched safeguard stack and publishes the result. The April 30 report leaves a verification gap on the jailbreak. The honest closure of that gap is a follow-up with a clean configuration. If it doesn't come, the gap stays open by default.
Whether Anthropic ships Mythos in any form, given that the capability case for keeping it withheld is now substantially weaker. Glasswing access is one thing; an API line is a different kind of decision. Watch the next Anthropic safety post for either a quiet expansion of Mythos availability or a reframed argument that doesn't rely on AISI numbers.
The third lab. AISI's "broader trend" line is a near-explicit prediction that DeepMind or Meta or DeepSeek will hit the same threshold this year. The interesting question is whether any of them adopts the gating architecture that Anthropic and OpenAI converged on, or whether the next entrant just ships and breaks the cartel. The first answer makes Trusted Access the standard. The second makes it the speed bump.

The week's frame everybody reaches for is "OpenAI catches up to Anthropic on cyber." That isn't the story. The story is that the model AISI used as the upper bound in April is the floor in May, and the institute's own conclusion is that there will be more next month. The trip-wire is now a treadmill. The labs are figuring out the next argument for why their access-control architecture should hold up under it. AISI, to its credit, is not.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop