What does this prompt actually cost?

Paste a prompt, pick a model, slide the cache-hit rate. See per-call, per-day, and per-month cost.

❂ Primer

Skip if you already know the theory; the interactive is right below.

API pricing is quoted per million tokens — separately for input, for output, and (increasingly) for reads and writes of the prompt cache. The multipliers are modest-looking until you multiply by volume.

This calculator counts tokens with the cl100k_base tokenizer (a good stand-in for most modern models) and runs the math across a small curated set of models. Everything happens in your browser. Prices are frozen to the date at the top of the breakdown — always verify against each provider's current pricing page before committing a budget to it.

▶ Try it

Loading pricing…

⁂ Notes from the bench

What to watch for, why it matters, and the one thing that usually surprises people.

What moves the bill

Output is the expensive part

At Claude Opus 4.7's rates, an output token costs five times more than an input token. Type a long prompt, set output to 100 tokens, and the input still dominates. Flip output to 5000 tokens — output overtakes it immediately. If you're optimizing cost, shorter answers save more than shorter prompts.

Cache hits move the needle

With a 50% cache hit rate on a long prompt, Anthropic's cached-read price is one-tenth the input rate. On high-volume pipelines with stable system prompts, caching can cut the monthly bill by 40–60%. Slide the cache hit rate up and watch the per-call cost drop. Then remember that caching needs you to keep the prompt prefix byte-for-byte stable across calls.

Model tier > clever engineering

At identical input+output, Haiku 4.5 is roughly 15× cheaper than Opus 4.7. The hardest cost-optimization question is usually "am I on the right tier?" — not "can I shave 10% off the prompt?"

Caveats

The tokenizer estimates input tokens with cl100k_base. Anthropic's tokenizer is close but not identical — assume ±10% error on the token count. Output token counts are whatever you specify. Cache prices model the common provider pattern (a cheap read rate + a writeable premium); specific cache semantics differ per provider, so the number here is a planning estimate, not an SLA.

In a line

Client-side cost calculator across Claude, GPT, and Gemini tiers. Uses cl100k_base for token counts; models input/cache/output prices with a cache-hit rate and call volume slider.

The Loop

What does this prompt actually cost?

What moves the bill

Output is the expensive part

Cache hits move the needle

Model tier > clever engineering

Caveats

Other experiments

How a sentence becomes tokens

Temperature and top-p, visibly

Tokens per second

How far should the model think?

Neural language vs a Markov chain

What each token looks at

Words in space

The injection arena

AI or human?

Context Tetris

Magnet flip