The Loop  ·  Issue 017

The Loop

A field journal of the AI frontier — for engineers who ship.

  Lab bench

Experiment №002
filed Apr 21, 2026

explainer

Filed under

  • #sampling
  • #temperature
  • #top-p
  • #softmax

Temperature and top-p, visibly

Move the dials. Watch a probability distribution collapse, flatten, or get its tail trimmed off.

  Primer

Skip if you already know the theory; the interactive is right below.

When a model generates the next token, it doesn't return a single answer. It returns a probability distribution over its entire vocabulary — roughly 100,000 numbers that sum to 1. The "response" you see is a weighted die roll on top of those numbers.

Temperature sharpens or flattens that distribution. Divide every logit by T; at T=0 the model becomes deterministic (always top token), at T=1 it samples from the raw distribution, at T=2 the curve flattens and rare tokens become plausible. Top-p (also called nucleus sampling) is a cutoff: keep only the smallest set of tokens whose cumulative probability exceeds p, and throw the rest away before sampling.

The distributions below aren't invented — they're hand-calibrated to the shape real models actually produce. Low-entropy prompts (a factual question, a well-known quote) concentrate most mass on one or two tokens. Open-ended prompts spread it out. The sliders below let you see how each knob reshapes a fixed starting distribution.

▶  Try it

Loading distributions…

  Notes from the bench

What to watch for, why it matters, and the one thing that usually surprises people.

Things to try

1. T=0 on "The capital of France is"

Slide temperature to 0 on the capital-of-France prompt. The bar chart collapses to a single bar. This is why factual questions at low temperature feel "reliable" — the model has nowhere to go but the right answer.

2. T=2 on any prompt

Crank temperature to 2. Watch the bars level out. Sample a few tokens — you'll get a mix of reasonable and weird. This is the "creativity" regime, and also the "hallucination" regime. They're the same setting.

3. Top-p at 0.3 on the open-ended prompt

Pick the "Once upon a time…" prompt, leave T at 1, slide top-p down to 0.3. Most tokens get struck through — they're out of the nucleus. This is why top-p often produces better-feeling samples than temperature alone: it bounds how weird the model can get, even at higher T.

4. Entropy as a thermometer

The entropy readout next to the chart measures how "spread out" the distribution is, in bits. Factual prompts sit near 0.1 bits. Creative prompts at T=1 might be 3–4 bits. At max T with top-p=1 you'll see the entropy approach log₂(vocab) — pure noise. Entropy is the number the decoder implicitly sees when deciding whether this is a confident step or a risky one.

What you actually ship with

Most APIs let you set temperature (0–1 or 0–2) and top-p (0–1). Reasoning and coding APIs default to very low T for a reason: you want the one right continuation, not a creative variant. Creative-writing APIs go the other way.

A model at T=0 isn't being thoughtful. It's being greedy. Both of those are fine answers; they just aren't the same answer.

In a line

Hand-calibrated distributions for five prompts, reshaped live with softmax-over-temperature and nucleus truncation. Includes a sample button for stochastic rolls and an entropy readout for the shape of each decision.

Other experiments

11
  1. Exp 001

    How a sentence becomes tokens

  2. Exp 003

    What does this prompt actually cost?

  3. Exp 004

    Tokens per second

  4. Exp 005

    How far should the model think?

  5. Exp 006

    Neural language vs a Markov chain

  6. Exp 007

    What each token looks at

  7. Exp 008

    Words in space

  8. Exp 009

    The injection arena

  9. Exp 010

    AI or human?

  10. Exp 011

    Context Tetris

  11. Exp 012

    Magnet flip