By AI Blog Editor
May 13, 2026 · 13 min read

The Stanford pipeline — OpenAI hires Gimlet Labs, the team that made Codex-Spark run on Cerebras

On May 12 The Information reported OpenAI has hired Gimlet Labs — the Menlo-backed startup whose software has been running GPT-5.3-Codex-Spark on Cerebras silicon since February. Gimlet's CEO is a Stanford research partner of Sachin Katti, who now runs OpenAI's compute.

A circular 12-inch silicon wafer with the rainbow iridescence of patterned dies catching the light against a black background. — 12-inch silicon wafer. The Cerebras Wafer-Scale Engine 3 is built from a single wafer this size. Photo Peellden, CC BY-SA 3.0 via Wikimedia Commons.

On May 12, 2026, The Information's Stephanie Palazzolo reported that OpenAI has hired Gimlet Labs, the San Francisco startup that has spent the last three months making GPT-5.3-Codex-Spark run on Cerebras Wafer-Scale Engines. Gimlet says its software speeds up inference by as much as 10x at the same cost and power. The detail that closes the loop, also in the Information piece: Sachin Katti — formerly Intel's CTO, now OpenAI's Head of Compute Infrastructure — was an early Gimlet advisor and coauthored research at Stanford with Gimlet's co-founder and CEO Zain Asgar.

That is the kind of disclosure that explains a launch the rest of the press took at face value. Codex-Spark has been live since February. OpenAI's $20 billion Cerebras commitment has been on the wire since April. The names of the engineers who actually wrote the kernels for the port surfaced this week, and they turned out to be the head of compute's old research partners.

What Gimlet actually does

Gimlet Labs, founded in 2023 by Asgar and Natalie Serrino — both formerly of Pixie Labs, which they sold to New Relic — emerged from stealth in October 2025 with what they call heterogeneous inference. The pitch, as Asgar told Chipstrat in March: take a PyTorch graph, trace it, find the segments that run best on which silicon, then lower each segment to that vendor's framework — TensorRT on NVIDIA, the Cerebras SDK on Cerebras, equivalents elsewhere. Their first published benchmark paired d-Matrix Corsair SRAM accelerators with NVIDIA B200s and shipped GPT-OSS 120B at "a 4x shift in the throughput-vs-interactivity Pareto frontier."

The company raised an $80 million Series A led by Menlo Ventures in early 2026 and disclosed eight-figure annualised revenue at emergence — a rare combination at that stage. Asgar himself is a former NVIDIA GPU architect and Google AI engineering lead; the Stanford adjunct CS professorship on his bio is the part that matters here.

The Stanford connection

Sachin Katti spent more than fifteen years on the Stanford CS faculty before joining Intel as Chief Network Officer and then CTO. Tom's Hardware reported in November 2025 that Katti left Intel after seven months as CTO and AI chief to run OpenAI's compute infrastructure. His own tweet on arrival: "Excited for the opportunity to work with @gdb, @sama and the @OpenAI team on building out the compute infrastructure for AGI!" The Stanford research that ties him to Asgar predates both of their current roles by several years — they were academic collaborators on heterogeneous-hardware performance work, and Katti became an early Gimlet advisor when the company spun up in 2023.

That is, by itself, not a story. Half of Silicon Valley is a former Stanford coauthor of someone. The story is what happened next.

When OpenAI shipped GPT-5.3-Codex-Spark on February 12 — the announcement on OpenAI's blog and the partner blog from Cerebras on the same day — it was the company's first production model running solely on non-NVIDIA hardware. Codex-Spark hits over 1,000 tokens per second on the Cerebras WSE-3, in a 128k-context window, served as a research preview through the Codex app, CLI, and VS Code extension. The Cerebras blog quoted Katti by name: "Cerebras has been a great engineering partner, and we're excited about adding fast inference as a new platform capability."

The standard reading at the time was that this was a Cerebras achievement — the WSE-3's SRAM-resident weights make 1,000 tokens-per-second feasible in a way GPUs cannot match for models small enough to fit the die. That reading was incomplete. The Information's reporting this week makes the harder part visible: getting an OpenAI training-stack model to actually serve on a wafer-scale accelerator is not a single-vendor problem. It is a graph-rewriting, kernel-tuning, scheduler-orchestration problem, and Gimlet had been doing it for OpenAI under a quiet advisory-and-services arrangement since at least last autumn.

The May 12 hire converts the relationship from advisor to org chart.

The Hoover Tower at Stanford University rising above palm trees and tile-roofed colonnades against a clear sky.

The $20 billion that bought the chips

Two months after the Codex-Spark launch, on April 16, The Information reported that OpenAI had agreed to spend more than $20 billion over three years on Cerebras chips, with up to $30 billion in cumulative commitments and $1 billion of OpenAI funding earmarked for new Cerebras data centers. The deal includes warrants that can convert to roughly a 10% stake in Cerebras as OpenAI hits spend tiers. Cerebras is targeting a Q2 2026 IPO at a reported $35 billion valuation.

The capital side of that announcement was loud. The engineering side was silent until this week. The $20 billion buys racks. It does not buy the compiler work that takes a PyTorch graph trained on H100s and lowers it to a wafer with 900,000 cores and a memory hierarchy that has no analogue on NVIDIA. That work was, and is, Gimlet's.

It is a familiar shape. OpenAI paid $20 billion for the chips, sent the chip company a billion for the data center build, took warrants for up to a tenth of the equity, then went out and wrote an offer letter to the dozen engineers who actually knew how to compile against the wafers. The offer letter is the cheapest line in the deal, and the one without which the other ten figures of capex are inert.

Why this is the right hire

The labs-versus-incumbents read on the AI build-out has, for two years, been about chips, GPUs, and gigawatts. The Codex-Spark launch was a quiet milestone for a different reason. It demonstrated that a frontier-lab model can be ported to a non-NVIDIA stack in production, at a latency profile NVIDIA cannot match for small models, and that the porting cost — measured in engineer-quarters, not in dollars — is tractable if you have the right team.

NVIDIA's moat is famously CUDA and the libraries that have been accreting around it for a decade. Gimlet's pitch — and the reason its $80M round and 8-figure ARR don't read as crazy — is that the moat is real but penetrable if you have a graph-level orchestrator that can hand sub-graphs to different vendors' kernels without rewriting the whole inference stack. The Information's 10x speedup at the same cost and power is the headline number. The harder claim underneath is that you can stop choosing one vendor.

If that holds, the implications for NVIDIA's pricing power are not subtle. The labs have been buying NVIDIA because, until very recently, nobody had shown they could run a state-of-the-art model on anything else without paying a year of engineering tax. OpenAI's Cerebras port and the Gimlet hire are the first credible counterexample.

What to watch

The retention curve at Gimlet. Acqui-hires of $80M-Series-A companies with eight-figure revenue do not have a great track record once the founders' cliff vests. Asgar is on his second exit (Pixie went to New Relic). Whether he stays at OpenAI long enough to ship more than Codex-Spark — or whether he is gone within eighteen months with a non-compete that lapses in 2029 — is the leading indicator on whether the buy-the-team thesis pays off.
The next non-NVIDIA model. Codex-Spark is the smallest model OpenAI is willing to ship right now: 128k context, text-only, research preview. The next ports will be the test. Can the same Gimlet pipeline lower the full Codex line, or a vision model, or — the milestone that would dent the NVIDIA narrative — a frontier multimodal Opus-class workload, onto Cerebras or TPU or Trainium silicon? If the answer is yes within two quarters, the heterogeneous-inference thesis stops being a thesis.
What Anthropic and Google do. Anthropic is renting Colossus 1 from xAI and training on Trainium 2 with AWS. Google is on TPUs. The lab that has not had a public "first non-NVIDIA production model" moment is the one without a vertically integrated chip story; with this hire, OpenAI just took that off the table for itself. The pressure to hire the next Gimlet — there are not many — moves to the labs whose chip story still depends on Jensen.

Codex-Spark looked like a Cerebras milestone in February. With the Gimlet disclosure, it now reads as the first visible step in a longer programme: building a heterogeneous inference platform inside OpenAI, on the foundation of Stanford research partnerships that long predate any of the current org charts. The chips are the cheque. The compiler is the moat.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop