§ News
By AI Blog Editor
May 21, 2026 · 13 min read
Flash at Pro prices — Gemini 3.5 Flash completes the three-lab pricing pattern
Google launched Gemini 3.5 Flash on May 19 at $1.50/$9 per million tokens — 3x the previous Flash and within a rounding error of Pro. The third frontier lab in a row to raise the floor under the new generation.

On May 19, 2026, at Google I/O in Mountain View, Sundar Pichai walked onstage and shipped Gemini 3.5 Flash. The model went straight to general availability across the API and Google's free-to-consumer products. The price card, per Google's own listing and confirmed by Simon Willison's read of the launch and Trending Topics: $1.50 per million input tokens and $9.00 per million output tokens. The previous Flash Preview was $0.50 and $3.00. The math is exactly 3x on both rails.
That alone is a story. The actual story is what 3x means once "Flash" stops meaning "the cheap tier" and starts meaning "the new model, smaller box."
Gemini 3.1 Pro lists at $2 input / $12 output. Gemini 3.5 Flash now lands at 75% of those prices for both rails. The Flash-to-Pro gap that defined the tier's reason for existing — call it 4x to 6x cheaper, depending on the generation — has narrowed to a 25% discount. Flash-Lite at $0.25 input still does the budget job. Flash, the name brand, has been promoted into the Pro neighbourhood and given a slightly cheaper rent.
The token bill is not the bill
The $1.50/$9 sticker is the half of the story Google's slide deck shows. The other half lives in Artificial Analysis's pre-release benchmark run, which Simon Willison and llm-stats both summarised the same week. Running the full suite cost $1,551.60 with Gemini 3.5 Flash, against $93.60 with 3.1 Flash-Lite and $892.28 with 3.1 Pro Preview. That is roughly 5.5x more expensive to run than the model it replaces, and 75% more expensive than the Pro tier sitting one rung above it.
Two things explain the gap between "3x per token" and "5.5x per benchmark." The new model averages 49 interaction turns per agentic task, more than any competitor Artificial Analysis tested. Output runs faster — 280+ tokens per second, roughly 70% faster than the predecessor and four times faster than Google's reference comparison models — but a model that takes more turns to finish also emits more tokens, and the tokens cost three times as much. The pricing card lists the multiplier. The agent loop applies it.
That is the polite version of "your invoice will be larger than the per-token math suggests." Anthropic does it. OpenAI does it. Google now does it.
The third lab in a row
The Loop covered the GPT-5.5 "Spud" launch back in April, where OpenAI doubled the rate card from $2.50/$15 to $5/$30 per million tokens and shipped a prompting guide that asked developers to "treat 5.5 as a new model family to tune for, not a drop-in replacement." The Decoder's price round-up that week put Opus 4.7's effective increase at 30-40% — not on the rate card, but in the consumption — and GPT-5.5's combined hike at 50-90% once both base and consumption were folded in. Gemini 3.5 Flash slots cleanly into the same column: a 3x rate-card increase, a 5.5x benchmark-cost increase, and a model that takes more turns to finish a task than anything else on the market.
Willison's framing, in his launch write-up, names the pattern directly: "All three of the major AI labs are starting to probe the price tolerance of their API customers." That sentence is the headline. Probing tolerance means pushing until something pushes back. The labs are not coordinating. They are independently arriving at the same answer to the same question, which is: "How much more can we charge for the new generation before anyone churns?"
So far, nobody has churned in any number that would show up in Ramp's panel or in the lab earnings calls. The price floor under the frontier generation just rose by 2-3x in eight months, and the API customers absorbing it are the same ones who built their unit economics on the previous floor.
What Google ships in exchange
Pricing aside, this is a real model. Google's own benchmark slide claims 76.2% on Terminal-Bench 2.1 (versus 70.3% for Gemini 3.1 Pro), 83.6% on MCP Atlas (versus 78.2%), and a 1,656 Elo on GDPval-AA — a 340-point jump on a benchmark Google designed to track economically valuable work. Artificial Analysis, which had pre-release access, scored it 55 on its Intelligence Index, nine points above the previous Flash, and measured 84% on MMMU-Pro multimodal — the highest the firm had recorded.
The honest read is that 3.5 Flash beats 3.1 Pro on most coding and agentic benchmarks while running roughly four times faster than comparable frontier models. The cheap-and-fast positioning is half-true: it is fast. It is no longer cheap. The marketing of "Flash" relies on the first half doing most of the work, and the second half being forgotten by the time the bill arrives.
Then there is the part Google did not put on the slide. The Trending Topics write-up flagged Pichai's onstage line about the upcoming Gemini 3.5 Pro tier: "I know you can hardly wait to get your hands on it. Give us until next month to bring it to you." Translation: the Pro tier is coming, presumably at a price that re-establishes the Flash-to-Pro gap by moving Pro up rather than Flash back down. Announcing a future price hike on the same stage where you raised today's price is a tidy move. Vendors call it a roadmap.
The subscription wrapper
The other half of I/O's pricing news is the consumer side. Google overhauled its Gemini subscriptions the same morning into three tiers: AI Plus at $7.99/mo, AI Pro at $19.99/mo, and AI Ultra at $99.99+/mo, the latter down from a previous $250 ceiling. The pricing model shifted from daily prompt caps to a "compute-used" meter where, in Google's words, "simple text requests eat less quota than complex video or coding prompts."
Compute-metering on the consumer side is the same trick the API side has been quietly running for two generations. Charge for tokens, then ship a model that uses more tokens, then say the per-token price barely moved. Pull that lever in front of consumers who buy on a monthly subscription, and the lever is harder to see — the only number they watch is the seat price.
The Ultra tier coming down from $250 to $200 is the small concession that makes the rest of the page legible. The new floor at $7.99 catches the people who would otherwise have used ChatGPT Free. Both are reasonable product calls. Neither changes the API economics.
What this means
Three takeaways.
- The "Flash" tier is now a brand, not a price band. Anyone wiring 3.5 Flash into a long-running agent stack on the assumption that the name still implies cheap-tier economics will see the wedge between expectation and invoice within a billing cycle. The new mental model is: Flash is what they call the new generation's smaller model, and "smaller" no longer means "5x cheaper than Pro." It means a 25% discount, with a turn-count penalty that closes the gap further once the model actually runs.
- The three-lab pattern is now well-evidenced and worth pricing in. OpenAI doubled. Anthropic hid its hike in higher token consumption on Opus 4.7. Google has now done both at once on Flash, in public. The next-generation Claude and the next-generation GPT will, by induction, both arrive with a sticker increase and a consumption increase. The question for anyone running a production stack is whether to lock long-term pricing now, route more workload to open-weights for the parts that tolerate it, or eat the increase and hope inference cost drops faster than the rate cards rise. Each choice is a different bet.
- Watch what Gemini 3.5 Pro lists at in June. Pichai pre-announced it onstage. If it ships at the current Pro band of $2/$12 with significant capability lift, the Flash hike was Google reshuffling the tiers around a wider gap. If it ships at $3/$18 or higher, the Flash hike was the first move and Pro is the second. Either outcome is informative; only one of them suggests the price floor has finished moving.
The Loop's view: the most useful sentence in the entire I/O coverage is Willison's "probing the price tolerance" line, because it correctly identifies the experiment in progress. Three labs are running it in parallel against a shared customer base. None of them has hit the limit yet. The thing to watch is not the model card. It is the moment one of those API customers, sitting on a budget that did not anticipate a 3x rate-card year, does the math and switches the agent loop to something open-weight for the 80% of work that does not need the frontier. That is the data point that ends the experiment. We are not there yet.
* * *
Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.
Elsewhere in this issue
3 more- 01
News
The first partner cut — days before Amazon's researchers flagged a Fable 5 vulnerability, the White House had already told Anthropic to revoke access for SK Telecom, its earliest Korean shareholder and a Project Glasswing partner, over concerns about the company's alleged ties to China. Five days later, Anthropic opened a Seoul office and signed every major Korean conglomerate that isn't SK.
Jun 19, 2026
- 02
The Patch
The Patch — June 19, 2026
Jun 19, 2026
- 03
News
The kill switch did the diplomacy — five days after Washington took Anthropic Fable 5 and Mythos 5 offline, Dario Amodei and Demis Hassabis sat down at the G7 in Évian-les-Bains and asked the allies to sign up for an explicitly US-led AI coalition. Canada said yes; France brought a list.
Jun 18, 2026
Letters
Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.