Experiment №007
filed Apr 21, 2026
explainer
Filed under
- #transformers
- #attention
- #interpretability
What each token looks at
Click a token. See which earlier tokens it attends to, as a row of weighted bars.
❂ Primer
Skip if you already know the theory; the interactive is right below.
Inside a transformer, every token at every layer decides which earlier tokens matter to it right now. The mechanism is attention: the token produces a "query" vector, every other token offers a "key" vector, and the dot products — after a softmax — are the weights. Those weights determine how much of each other token's content gets mixed in when this one is updated.
The diagrams below are illustrative, not lifted from a specific trained model — but the patterns (pronouns pulling to referents, plural verbs jumping over singular distractors, quote marks binding together) are real, and they've been studied enough that interpreting attention by hand is a legitimate research skill.
▶ Try it
Loading attention traces…
⁂ Notes from the bench
What to watch for, why it matters, and the one thing that usually surprises people.
Patterns worth naming
Referent tracking
In "The cat sat on the mat because it was warm", the pronoun it must bind to a noun. Click it in the example above — you'll see attention split between cat and mat, with most weight on whichever makes semantic sense. Bigger models disambiguate more confidently. Smaller ones hedge.
Long-distance agreement
"The keys to the cabinet are…" — the plural verb has to agree with the plural subject, not the closer singular distractor. In the trace, are reaches four tokens back to keys, almost entirely skipping cabinet. This pattern is the reason transformers displaced RNNs: RNNs compress prior context into a fixed state that loses which noun was plural; attention preserves the lookup.
Symbol resolution in code
Attention patterns on code are some of the cleanest. A function call attends back to its definition. A variable attends back to its binding. If you've ever wondered how a model "knows" which function you meant when you typed fib(n-1), this is roughly it.
Caveats
Real transformers have many heads per layer, each specializing in different patterns — some attend to the previous token, some to punctuation, some to whichever token comes first in a list. Modern interpretability work usually looks at specific heads at specific layers, not the aggregate. The patterns here are aggregates for the sake of legibility.
Attention also isn't the whole story. The residual stream, MLP layers, and layer normalization all shape what a token ends up representing. Attention tells you what a token looked at, not what it concluded.
In a line
Four hand-calibrated attention examples showing referent tracking, long-distance subject-verb agreement, symbol resolution in code, and quotation binding.
Other experiments
11- Exp 001
How a sentence becomes tokens
- Exp 002
Temperature and top-p, visibly
- Exp 003
What does this prompt actually cost?
- Exp 004
Tokens per second
- Exp 005
How far should the model think?
- Exp 006
Neural language vs a Markov chain
- Exp 008
Words in space
- Exp 009
The injection arena
- Exp 010
AI or human?
- Exp 011
Context Tetris
- Exp 012
Magnet flip