By AI Blog Editor
Jun 3, 2026 · 11 min read

Learned, not inherited — Microsoft's first reasoning model ships a benchmark chart that doesn't mention OpenAI

On June 2, 2026, Microsoft's Superintelligence team released MAI-Thinking-1 and MAI-Code-1-Flash at Build. The press release uses the word "inherited" once and the word "OpenAI" never. The competitor named on every chart is Anthropic.

Black-and-white photograph from December 17, 1903 of the Wright Flyer taking off from Kitty Hawk, North Carolina. Orville Wright is at the controls; Wilbur Wright runs alongside the right wingtip. The aircraft is a few feet off the sand. — The Wright brothers' first powered flight, December 17, 1903. Photograph by John T. Daniels, public domain via Wikimedia Commons.

On Tuesday June 2, 2026, at Build in Seattle, Microsoft's Superintelligence team released its first flagship reasoning model. It is called MAI-Thinking-1. It has roughly one trillion total parameters with 35 billion active in a sparse Mixture-of-Experts layout, a 256,000-token context window, and a benchmark sheet that lists 97.0% on AIME 2025, 94.5% on AIME 2026, and a SWE-Bench Pro score Microsoft describes as matching Claude Opus 4.6. In a 1,276-task blind side-by-side evaluation run by Surge, Microsoft says the model is preferred over Claude Sonnet 4.6. Alongside it Microsoft shipped MAI-Code-1-Flash, a five-billion-active-parameter coding model that began rolling out the same day to every GitHub Copilot tier — Free, Pro, Pro+, Max.

The headline number is none of those. The headline is the sentence Microsoft chose for the top of the announcement post:

"Capabilities should be learned, not inherited. We train MAI-Thinking-1 entirely without distillation from third-party models — and exclusively on data that is appropriately licensed."

Translated: no OpenAI. Trained without the partner Microsoft has been pricing into every cloud-and-AI earnings call since 2023.

What the comparison set tells you

Pull out the benchmarks Microsoft picked. AIME 2025. AIME 2026. SWE-Bench Pro. The 1,276-task Surge eval. The competitor named on every one of them is Anthropic — Sonnet 4.6, Opus 4.6, Haiku 4.5. The MAI-Code-1-Flash post lists a 51.2% SWE-Bench Pro pass rate against Claude Haiku 4.5's 35.2%, a +28.9-point lead on IF Bench over the same Haiku, and a token-efficiency claim of up to 60% fewer tokens on harder problems. The words "GPT" and "OpenAI" do not appear on either announcement page. That is a deliberate omission from a company whose product surface still runs OpenAI models in production. The model Microsoft chose to benchmark against is the one it does not pay revenue share on.

The April 2026 amendment to the Microsoft–OpenAI deal killed the cloud exclusivity and the AGI clause but preserved OpenAI's capped revenue share through 2030. Six weeks later, Microsoft published a flagship reasoning model and a benchmark chart that does not include the partner it pays the revenue share to. Both things can be true at once. The amendment said Microsoft could compete; this is what competing looks like.

Pieter Bruegel the Elder's 1563 oil painting The Tower of Babel, depicting an enormous spiral tower of unfinished stone rising into the clouds above a port city.

What "appropriately licensed" actually means

Simon Willison pulled the MAI-Thinking-1 technical paper on Tuesday afternoon and updated his own post within the hour. The press-release line is "clean and appropriately licensed data" with no third-party distillation. The technical paper describes the actual pre-training mix: a proprietary web crawl filtered down to roughly 794 billion pages, plus Common Crawl.

That is not a scandal. It is the same recipe every frontier lab uses. It is also not what most readers will hear when they read "appropriately licensed." Microsoft's framing turns on the difference between we did not distill another model's outputs (which the paper does support) and we obtained licences for our training corpus (which the paper does not support beyond the standard fair-use posture every other lab also relies on). The first sentence is a real engineering claim. The second is press-release laundering.

The honest version of the sentence would read: "We did not train on OpenAI outputs, and we trained on the same web everyone else trained on." That sentence does not make the slide deck.

What MAI-Code-1-Flash is actually for

The Copilot model is the more interesting product question. The numbers in the MAI-Code-1-Flash announcement are the kind of head-to-head comparisons Microsoft has not, historically, published in a launch post. The reason is in the same post: MAI-Code-1-Flash was "trained directly inside GitHub Copilot's production harness rather than benchmarked externally and then deployed." That is the engineering distinction worth caring about. The model was tuned on the agentic loop it ships into, not on a generic SWE-Bench setup.

The product implication is sharper than the benchmark. Every Copilot user — free tier included — is the rollout. The default model in the picker will, over the next few weeks, be the one Microsoft trained, owns the data of, and pays no revenue share on. This is the substitution that matters in the P&L. The reasoning model is the press release; the code model is the cost line.

The supporting cast nobody is reading about

Microsoft also shipped MAI-Image-2.5 (live in PowerPoint), MAI-Transcribe-1.5 (43 languages, a claimed five-times speed advantage), and MAI-Voice-2 (15+ languages with voice adaptation) on the same day, per The Thurrott Report's coverage. None of them earned a paragraph in the analyst notes that went out by Tuesday night. They will all be in the Microsoft 365 stack within a quarter. The story Microsoft is telling at Build is we have our own stack now; the story the press is hearing is here is the reasoning model. The second one gets the column inches and the first one ends up in the 10-K.

What this means

Three takeaways.

The "no OpenAI" framing is the news, and the technical paper is the asterisk. Microsoft did the engineering work to train without distillation from third-party models, and that is genuinely worth saying — the previous in-house attempts under the MAI banner were thin enough that Mustafa Suleyman's team had to publish a flagship eventually or hand the narrative to Anthropic. Microsoft also chose the words "appropriately licensed" to do load-bearing work the citations cannot support. Both readings are correct. The headline is the first one. The compliance memo, if any of this is ever litigated, will be the second.
Benchmarks against Anthropic are the new neutral. The MAI posts mention Claude eight times across two announcement pages and OpenAI zero. That is the comparison set you publish when the model you would be embarrassed to lose to is the one you pay revenue share to. The April amendment did not free Microsoft to compete with OpenAI in the slides. It freed Microsoft to print a chart in which OpenAI is not present at all.
Copilot is where the substitution actually lands. MAI-Thinking-1 is in private preview on Foundry. MAI-Code-1-Flash is in every Copilot tier as of Tuesday. The reasoning model is the trophy; the code model is the unit economics. Watch the Copilot gross-margin disclosure in next quarter's earnings — that is the line that tells you whether MAI-Code-1-Flash is what it costs Microsoft to write code, or what it costs Microsoft not to pay OpenAI to write code. The two converge over time. They are not the same number on day one.

The Build keynote called the launch "the start of an exciting new chapter." The 10-K will call it a related-party-transaction disclosure footnote. Both descriptions will be in the same document by November.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop