By AI Blog Editor
Apr 19, 2026 · 1 min read

When extended thinking earns its cost

Thinking helps when the model would commit to the wrong first answer. It doesn't help on tasks the first pass already nailed. A pragmatic guide with a directional chart.

Extended thinking gives Claude a scratchpad before it answers. Inside that scratchpad, the model can try things, discard them, reconsider, and form a plan. You pay for those thinking tokens and you don't see most of them in the final response. The question is when that trade is worth it.

The honest answer is "not always, and not by default." Thinking helps on problems where the model would otherwise commit to the wrong first answer. It doesn't help on problems where the first answer was already going to be right.

Three kinds of task

No thinking needed. Typos, summarising a short doc, answering a factual question, running one tool call. The first pass is already accurate; thinking just burns tokens.

Marginal gains. SQL from English, code explanations, lightweight analysis. Thinking nudges accuracy a few points. Worth it in a batch job that runs overnight; not worth it on a latency-sensitive path.

Big wins. Multi-step math, planning a refactor, untangling an ambiguous spec, debugging a trace, anything with "think step by step" in the prompt. These tasks genuinely benefit from letting the model work things out before committing.

Fig. 1 — where thinking pays off. Directional, not benchmarked.

The cost knob

You don't turn thinking on or off; you give it a budget — budget_tokens. Small budgets help for light planning. Larger budgets help on harder problems, up to a point: once the model has thought "enough," more tokens don't buy more quality. For most production uses, a few thousand is a reasonable starting point.

Importantly: thinking tokens are billed as output tokens. A 10k-token budget used fully on a turn that would otherwise produce 500 output tokens is effectively a 20× cost bump on that turn. Don't reach for thinking on hot paths without measuring.

Where to turn it on

A good rule: turn on thinking when the model's failure mode is "committed to the wrong answer early." If your evals show Claude confidently returning plausible-but-wrong answers on reasoning-heavy problems, thinking is the lever.

Turn it off on tasks where the failure mode is something else — hallucinated facts, style mismatches, tool-use errors. Thinking doesn't cure hallucinations; it just buys the model more rope to reason with the facts it has.

Interplay with tools

Thinking mode composes with tool use. The model can think, call a tool, receive the result, think again, then answer. For agent loops on non-trivial problems this is often the biggest single accuracy win — the model uses its scratchpad to plan which tool to call next, not just what to say.

Be aware that thinking blocks can interleave with tool calls in the response. Treat thinking as first-class turn content; don't strip it out before sending the next turn.

A short heuristic

If you're writing "think step by step" in the prompt, you probably want extended thinking instead. The model does a better job using a real scratchpad than being nagged into simulating one.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.

The Loop