By AI Blog Editor
Apr 24, 2026 · 11 min read

DeepSeek V4 — open weights, a million-token context, and a price cut the frontier labs have to answer

DeepSeek shipped V4-Pro and V4-Flash under an MIT licence, with 1M-token context and prices roughly a twentieth of what Anthropic and OpenAI charge today. The company's own technical report calls the model three to six months behind the frontier — which is the point.

On April 24, 2026, DeepSeek pushed two checkpoints to Hugging Face — V4-Pro at 1.6 trillion parameters and V4-Flash at 284 billion — both under an MIT licence, both with a 1 million-token context window, and both priced on DeepSeek's own API at roughly a twentieth of what Anthropic and OpenAI charge for their current flagships.

The technical report frames the release as "approximately 3 to 6 months" behind the frontier. That is a sentence you will not find in any other lab's release notes, for reasons that become obvious once you read it twice.

The numbers DeepSeek actually put on paper

From DeepSeek's own API pricing page and the two Hugging Face model cards:

V4-Pro — 1.6T total parameters, 49B active per token, MoE architecture, MIT licence. $1.74 per million input tokens, $3.48 per million output.
V4-Flash — 284B total, 13B active, same architecture family, MIT licence. $0.14 per million input, $0.28 per million output.
1M-token context on both, with up to 384K output tokens.
Claimed efficiency: 27% of V3.2's single-token inference FLOPs and 10% of its KV cache at 1M context — measured on their own model family, and the single technical claim most worth dwelling on.

Compare the pricing to the published frontier rates: GPT-5.4 runs $2.50 in / $15 out per million, Claude Opus 4.6 runs $5 / $25. V4-Flash undercuts Claude Opus's output tokens by 89×. V4-Pro undercuts it by about 7×. Even allowing for generous batching discounts and enterprise contracts, the gap is not rounding error.

How close to the frontier, really

OfficeChai pulled the benchmark table into one place, and the picture is more mixed than the price implies. V4-Pro's coding numbers are genuinely competitive:

LiveCodeBench: V4-Pro 93.5, Gemini 3.1 Pro 91.7, Claude Opus 4.6 88.8.
Codeforces rating: V4-Pro 3206, GPT-5.4 3168.
SWE-bench Verified: V4-Pro 80.6, Claude 80.8 — effectively tied.

The factual-recall story is worse. SimpleQA-Verified sits at 34.1 for V4-Flash against Gemini's 75.6, and HLE trails the frontier by a meaningful margin. The shape is familiar: V4 is excellent at problems that look like "write code that passes this test" and more ordinary at problems that look like "remember the right thing about the world."

The million-token context is not just a spec-sheet number. DeepSeek's efficiency paper is the most technically interesting part of the release — Compressed Sparse Attention and Heavily Compressed Attention stacked together, Manifold-Constrained Hyper-Connections on the residual path, the Muon optimiser on the training side. The claim that you can serve 1M tokens at one-tenth of V3.2's KV cache is the kind of thing that, if it holds up under independent testing, resets what the 1M-token tier is supposed to cost everyone else.

The price is the story

Simon Willison's read is blunter than mine: "Almost on the frontier, a fraction of the price." That is the whole pitch.

It is not a coincidence that this release lands thirty-six hours after OpenAI's GPT-5.5 announcement. The last DeepSeek release (R1, January 2025) dragged free-tier access and pricing concessions out of OpenAI within two weeks. The industry pattern of the last three years is that DeepSeek does not have to beat the frontier on quality to force the frontier to respond on price — it only has to be close enough that the 20× premium stops being defensible to procurement.

Which brings up the uncomfortable question nobody at the frontier labs wants to answer in a deposition: if a 1.6T MoE with open weights can deliver 93.5 on LiveCodeBench at $3.48 per million output tokens, what exactly is the pricing floor on Claude Opus and GPT-5.4? "Better alignment" is not a line item on an AWS invoice.

The Huawei question

Reuters reported on April 3 that V4 would run on Huawei's newest Ascend chips — a claim DeepSeek has not directly confirmed or denied in the technical report. If that reporting holds, it is the second significant Nvidia substitution story of the month, after the Anthropic–Amazon Trainium expansion announced three days prior. Anthropic is training on AWS silicon. DeepSeek is — apparently — training on Huawei silicon. Nvidia is the party absent from both press releases, and that absence is starting to form a pattern.

Training a trillion-parameter MoE on domestic Chinese hardware is also the part of the story most likely to end up in an export-control briefing document. The US Commerce Department spent most of 2025 arguing about whether DeepSeek V3 had been trained on sanctioned H100 inventory. V4, if Reuters is right, sidesteps that fight by not needing the chips in the first place.

The admission at the top of the technical report

Back to the sentence the other labs would never print. DeepSeek writes, in their own paper, that V4-Pro "trails state-of-the-art frontier models by approximately 3 to 6 months" compared to GPT-5.4 and Gemini 3.1 Pro.

Three observations:

None of OpenAI, Anthropic, or Google has ever said, in print, how far behind they are from each other. The number is always "frontier," never "three months behind frontier." DeepSeek being willing to name the gap is either strategic clarity or a strategic weapon — probably both.
Three-to-six months is the gap at which a model is cheap enough to be the default backend for everything that doesn't genuinely need the absolute state of the art — which is most of what real applications actually do.
If the gap holds at three-to-six months and V4's successor is also MIT-licensed, the frontier labs' moat is not capability. It is distribution, trust, and compliance paperwork. Those are real moats. They are not moats that can be priced at 20×.

What to watch

Three concrete things over the next ninety days will decide whether this release lands the way R1 did in January 2025.

Whether GPT-5.5 or a Claude 4.8 ships a matching price cut. If the frontier labs hold the line, V4 takes the commodity tier. If they match, DeepSeek's own revenue model gets squeezed — which is worth watching on its own, because the math that lets DeepSeek charge $0.28 per million output tokens is not a secret DeepSeek can keep from its own backers forever.
Whether the 1M-context efficiency claim survives independent benchmarking. The 10% KV-cache number is the single line in the technical report that, if true, reprices inference at the long-context end of the market. If it is true only on DeepSeek's own serving stack, the open-weights release is less useful than it looks.
Whether Western inference providers pick V4 up. Together.ai, Fireworks, and Groq will decide in the next two weeks. The speed at which V4 shows up — or doesn't — on those platforms is the real market read on export-control risk, not anything that gets said out loud.

DeepSeek is not winning the race to the top. They have explicitly said, in their own technical report, that they are not trying to. What they are doing is far more disruptive to the economics of frontier labs than catching up would have been: shipping a capable-enough model open, cheap, and available — and letting the frontier labs explain, one earnings call at a time, why the premium still applies.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop