The Loop  ·  Issue 017

The Loop

A field journal of the AI frontier — for engineers who ship.

  Lab bench

Experiment №003
filed Apr 21, 2026

tool

Filed under

  • #pricing
  • #caching
  • #claude
  • #gpt
  • #gemini

What does this prompt actually cost?

Paste a prompt, pick a model, slide the cache-hit rate. See per-call, per-day, and per-month cost.

  Primer

Skip if you already know the theory; the interactive is right below.

API pricing is quoted per million tokens — separately for input, for output, and (increasingly) for reads and writes of the prompt cache. The multipliers are modest-looking until you multiply by volume.

This calculator counts tokens with the cl100k_base tokenizer (a good stand-in for most modern models) and runs the math across a small curated set of models. Everything happens in your browser. Prices are frozen to the date at the top of the breakdown — always verify against each provider's current pricing page before committing a budget to it.

▶  Try it

Loading pricing…

  Notes from the bench

What to watch for, why it matters, and the one thing that usually surprises people.

What moves the bill

Output is the expensive part

At Claude Opus 4.7's rates, an output token costs five times more than an input token. Type a long prompt, set output to 100 tokens, and the input still dominates. Flip output to 5000 tokens — output overtakes it immediately. If you're optimizing cost, shorter answers save more than shorter prompts.

Cache hits move the needle

With a 50% cache hit rate on a long prompt, Anthropic's cached-read price is one-tenth the input rate. On high-volume pipelines with stable system prompts, caching can cut the monthly bill by 40–60%. Slide the cache hit rate up and watch the per-call cost drop. Then remember that caching needs you to keep the prompt prefix byte-for-byte stable across calls.

Model tier > clever engineering

At identical input+output, Haiku 4.5 is roughly 15× cheaper than Opus 4.7. The hardest cost-optimization question is usually "am I on the right tier?" — not "can I shave 10% off the prompt?"

Caveats

The tokenizer estimates input tokens with cl100k_base. Anthropic's tokenizer is close but not identical — assume ±10% error on the token count. Output token counts are whatever you specify. Cache prices model the common provider pattern (a cheap read rate + a writeable premium); specific cache semantics differ per provider, so the number here is a planning estimate, not an SLA.

In a line

Client-side cost calculator across Claude, GPT, and Gemini tiers. Uses cl100k_base for token counts; models input/cache/output prices with a cache-hit rate and call volume slider.

Other experiments

11
  1. Exp 001

    How a sentence becomes tokens

  2. Exp 002

    Temperature and top-p, visibly

  3. Exp 004

    Tokens per second

  4. Exp 005

    How far should the model think?

  5. Exp 006

    Neural language vs a Markov chain

  6. Exp 007

    What each token looks at

  7. Exp 008

    Words in space

  8. Exp 009

    The injection arena

  9. Exp 010

    AI or human?

  10. Exp 011

    Context Tetris

  11. Exp 012

    Magnet flip