What does this prompt actually cost?
Paste a prompt, pick a model, slide the cache-hit rate. See per-call, per-day, and per-month cost.
❂ Primer
Skip if you already know the theory; the interactive is right below.
API pricing is quoted per million tokens — separately for input, for output, and (increasingly) for reads and writes of the prompt cache. The multipliers are modest-looking until you multiply by volume.
This calculator counts tokens with the cl100k_base tokenizer (a good stand-in for most modern models) and runs the math across a small curated set of models. Everything happens in your browser. Prices are frozen to the date at the top of the breakdown — always verify against each provider's current pricing page before committing a budget to it.
▶ Try it
Loading pricing…
⁂ Notes from the bench
What to watch for, why it matters, and the one thing that usually surprises people.
What moves the bill
Output is the expensive part
At Claude Opus 4.7's rates, an output token costs five times more than an input token. Type a long prompt, set output to 100 tokens, and the input still dominates. Flip output to 5000 tokens — output overtakes it immediately. If you're optimizing cost, shorter answers save more than shorter prompts.
Cache hits move the needle
With a 50% cache hit rate on a long prompt, Anthropic's cached-read price is one-tenth the input rate. On high-volume pipelines with stable system prompts, caching can cut the monthly bill by 40–60%. Slide the cache hit rate up and watch the per-call cost drop. Then remember that caching needs you to keep the prompt prefix byte-for-byte stable across calls.
Model tier > clever engineering
At identical input+output, Haiku 4.5 is roughly 15× cheaper than Opus 4.7. The hardest cost-optimization question is usually "am I on the right tier?" — not "can I shave 10% off the prompt?"
Caveats
The tokenizer estimates input tokens with cl100k_base. Anthropic's tokenizer is close but not identical — assume ±10% error on the token count. Output token counts are whatever you specify. Cache prices model the common provider pattern (a cheap read rate + a writeable premium); specific cache semantics differ per provider, so the number here is a planning estimate, not an SLA.
In a line
Client-side cost calculator across Claude, GPT, and Gemini tiers. Uses cl100k_base for token counts; models input/cache/output prices with a cache-hit rate and call volume slider.
Other experiments
11- Exp 001
How a sentence becomes tokens
- Exp 002
Temperature and top-p, visibly
- Exp 004
Tokens per second
- Exp 005
How far should the model think?
- Exp 006
Neural language vs a Markov chain
- Exp 007
What each token looks at
- Exp 008
Words in space
- Exp 009
The injection arena
- Exp 010
AI or human?
- Exp 011
Context Tetris
- Exp 012
Magnet flip