By AI Blog Editor
Jun 25, 2026 · 19 min read

A chip called Jalapeño — OpenAI ships its first custom silicon, designed in nine months with Broadcom, into a 26-gigawatt compute pile

On June 24, 2026, OpenAI and Broadcom unveiled Jalapeño — OpenAI's first custom inference chip, taken from blank-slate design to manufacturing tape-out in nine months. It is the third leg of a 26-gigawatt commitment that now spans Nvidia, AMD and Broadcom silicon.

A close-up colour photograph of ripened red jalapeño peppers piled in a white ceramic bowl, taken in July 2012. The peppers are the jalapeño cultivar's mature red form, the same variety better known in its green unripe state. OpenAI and Broadcom on June 24, 2026 named their first custom AI accelerator "Jalapeño" — a chip designed from blank-slate architecture to manufacturing tape-out in nine months, intended for LLM inference at gigawatt scale, with initial deployment starting late 2026 and a multi-generation roadmap stretching to Hock Tan's stated 2028 full ramp. — Ripe jalapeño peppers in a bowl. Photograph by jeffreyw (2012), CC BY 2.0 via Wikimedia Commons.

On Wednesday June 24, 2026, OpenAI and Broadcom unveiled Jalapeño — OpenAI's first custom AI accelerator, built for large-language-model inference, scheduled for initial deployment at the end of 2026. The chip was co-developed from blank-slate design to manufacturing tape-out in nine months, with engineering samples already running production workloads including a model the press release names GPT-5.3-Codex-Spark. It is fabricated and integrated through Broadcom, with Celestica supplying boards, racks and system integration, and uses Broadcom's Tomahawk Ethernet networking for the cluster interconnect.

Eight months ago, OpenAI was a model lab buying compute from Microsoft and Nvidia. In June 2026 it is a model lab that designs accelerators in-house, has booked 26 gigawatts of compute across three silicon vendors, and is shipping its own silicon into the first gigawatt of that pile. The story is not that OpenAI built a chip. The story is the speed it built one in, and the size of the pile it is building one for.

Nine months from blank slate to tape-out

The number that should land first is nine. Custom AI accelerators at this complexity typically take 18 to 24 months from architecture to tape-out, with another six to nine to qualify the silicon. Google's first TPU took the better part of two years. AWS Trainium took longer. The industry assumption inside Broadcom's own product playbook is that a first-of-its-kind ASIC, designed for a workload the customer has only had three years to study, is a two-year project. OpenAI and Broadcom did it in nine.

How? The company's published explanation, repeated by Constellation Research, TechCrunch, and Tech Startups, is that OpenAI used its own models to compress parts of the design, verification and optimisation loop — the same way Cursor's customers use Codex to compress a software-engineering loop. Richard Ho, OpenAI's hardware lead, said the chip was "designed from the ground up for LLM inference using detailed insights from collaboration with researchers." Translation: the engineers writing the SystemVerilog were not guessing at what the workload looked like. They had the workload running, on Nvidia, while they wrote the chip that would replace it.

If the nine-month timeline holds up under closer scrutiny, two things follow. First, the duration of an ASIC project is no longer set by how long it takes humans to write and verify the RTL — it is set by how long it takes to physically tape out the design. Second, the moat that protected Nvidia for a decade — that custom silicon was too slow to design — has just gotten a great deal shallower. That is a sentence that costs Jensen Huang $300 billion of market cap, the day the market decides it is true.

The 26-gigawatt pile

Jalapeño does not arrive into an empty compute roadmap. Over October 2025, OpenAI announced three back-to-back deals that, per TrendForce's tally, now sum to 26 gigawatts of committed AI compute:

10 GW of Nvidia Vera Rubin systems, attached to a $100 billion OpenAI investment, with the first GW deploying in H2 2026.
6 GW of AMD Instinct GPU clusters over multiple years, first GW also in H2 2026.
10 GW of OpenAI-designed accelerators built with Broadcom — the deal Jalapeño is the first product of, announced October 13, 2025, deployments starting H2 2026 and completing by the end of 2029.

Twenty-six gigawatts. For context, the United Kingdom's peak electricity demand on a winter evening is about thirty. New York City at the height of an August heatwave clears twelve. OpenAI is committing, on paper, to roughly twice the peak load of America's largest city for a single workload — model inference — across three suppliers, none of whom share an architecture.

The Stargate framing inside OpenAI's October release ties all three to the $500 billion Stargate Project. The cleaner reading is that OpenAI has decided that none of the three silicon vendors can be allowed to become the company's single point of failure. Nvidia for training. AMD for the second-source training pool. Broadcom-built OpenAI silicon for inference. Whichever bottleneck the company hits first — fab capacity at TSMC, HBM allocation at SK Hynix, or Nvidia's pricing — it has an alternative wired up before the bottleneck binds.

Inference, not training — and the unit economics

The decision to build for inference first is the strategic one. Training compute is bought rarely, in big lumpy contracts, and the dominant unit cost is the depreciated capex on the GPUs. Inference compute is bought every second, and the dominant unit cost is the operating energy bill and the chip's performance per watt. A 20% improvement in inference performance per watt, applied to a workload OpenAI is already running at hundreds of megawatts, pays for the chip's NRE in weeks.

Jalapeño's stated claim is "performance per watt substantially better than current state-of-the-art." OpenAI has not put a number to that yet. TechTimes and Tony Reviews Things both report a roughly 50% cheaper inference target for the workloads the chip is dimensioned for; that number does not appear in the OpenAI or Broadcom press releases the Loop could verify, so treat it as a directional figure from the analyst chatter rather than a vendor commitment.

The press materials are clearer on the platform architecture. Constellation Research reports that the design "reduces data movement and balances compute, memory and networking" — the standard description of an inference-optimised architecture trading peak FLOPs for memory-bandwidth-per-FLOP, the choice that matters when the kernel is autoregressive token generation rather than matmul-bound training. Broadcom is supplying Tomahawk Ethernet for the cluster fabric — not InfiniBand — which positions the platform inside the Ethernet-for-AI camp that has been building since the Ultra Ethernet Consortium coalesced in 2023.

A close-up colour photograph of the polished mirrored surface of a 12-inch silicon wafer, taken in 2019, showing the reflective surface tilting through the colour spectrum at the edges as light bends across the crystalline silicon. Jalapeño is the first OpenAI-designed AI accelerator to reach manufacturing tape-out, built with Broadcom and integrated by Celestica into rack-level systems; initial deployment is targeted for the end of 2026 with Hock Tan's stated ramp peaking in the first half of 2028.

The Microsoft sentence inside the Broadcom press release

The single most quoted line from the announcement is Hock Tan's — Broadcom's CEO. His full quote, per Tech Startups' write-up of the press release, is: "This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt-scale data centers with Microsoft and other partners beginning in 2026."

Microsoft is named. The Loop wrote on Monday about Microsoft's 20-year Chevron gas PPA in Pecos, Texas — 2 gigawatts of behind-the-meter generation, scaling to 2.67 GW at full buildout, first power 2028. Hock Tan's sentence reads like the silicon half of the same campus plan. Microsoft has the power. Broadcom has the racks. OpenAI has the model. The triangle closes when Jalapeño ramps in late 2026 and Pecos energises in 2028 — and the AI workload the campus is sized for runs on chips OpenAI designed, not chips OpenAI bought.

On the same earnings cadence, Hock Tan told CNBC there would be "small prototype development" in late 2026 and that the platform would "start seeing it really ramp up in '27 and really going full tilt in first half '28." That is the same window Pecos powers up. The same window Cursor's SpaceX merger is supposed to close, with xAI's Colossus next door. Different vendors, same calendar.

Greg Brockman's sentence

The OpenAI-side quote that travelled is Greg Brockman's: "The world is moving to a compute-powered economy. Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems."

The phrase full-stack infrastructure strategy is the news. OpenAI has spent the last twelve months publicly framing itself as a model lab. Brockman's sentence reframes the company, for the first time on the record, as an integrated buyer-and-designer of every layer beneath the model: training compute (Nvidia, AMD), inference silicon (Broadcom), networking (Broadcom Tomahawk), system integration (Celestica), and the energy contract that feeds the rack (Microsoft Pecos). The model is one layer. The press release is announcing the other six.

What to watch

The performance number. Jalapeño's stated edge is "substantially better" performance per watt than current state-of-the-art. The technical disclosure is promised for a future release. If, when it lands, the chip is within 5–10% of Nvidia Vera Rubin on the inference workloads OpenAI runs in volume, the procurement story is mixed. If it is 30%+ better, OpenAI has the unit-economics case to shift the inference pool to Jalapeño as fast as Celestica can build the racks, and the Vera Rubin 10 GW commitment quietly becomes a training-only allocation.
The fab. Neither party has named the foundry. The market assumption is TSMC's N3 family. If the part lands on Intel 18A or Samsung's gate-all-around node instead, the announcement is also a foundry diversification announcement — and the second-order beneficiary is not Broadcom.
Anthropic's and Google's response. Google has had a custom inference accelerator stack (TPU) for a decade. Anthropic runs on AWS Trainium under the $4 billion AWS partnership and has been publicly silent on whether it is designing its own silicon. If an Anthropic-with-AWS-with-Trainium-3 announcement lands inside ninety days, the inference-silicon market has bifurcated. If it does not, Anthropic is conceding the cost-per-token race on the chip layer and betting that the safety case is the moat.
The 26-GW number itself. The three deals stack to 26 GW on paper. The pace at which the gigawatts arrive on the grid is the harder question. Microsoft, Oracle, CoreWeave and Stargate are the supply-side names; each has its own grid, gas, water and zoning timeline. If the H2 2026 first-GW slips on any of the three suppliers, the inference roadmap that Jalapeño anchors slips with it. Watch the ERCOT queue and the 438-gigawatt large-load filings the Loop covered on Tuesday — that is where Stargate's first GW has to land.

The week OpenAI announced its first chip, the company stopped being a tenant of the compute economy and became a landlord with three suppliers under non-exclusive lease. The model layer is still where the conversation happens. The 26 gigawatts underneath it are where the money has already moved.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop