Experiment №008
filed Apr 21, 2026
explainer
Filed under
- #embeddings
- #vectors
- #rag
- #semantic-search
Words in space
A hand-placed 2D layout of 80 AI-industry words. Click any to find its nearest neighbors.
❂ Primer
Skip if you already know the theory; the interactive is right below.
An embedding is a word (or sentence, or image) represented as a vector of maybe 1536 numbers. Words used in similar contexts end up near each other in that high-dimensional space — and that's the whole reason embeddings are useful. Every RAG pipeline, every "find me the related article" feature, every semantic search box is built on the same intuition: nearness in embedding space ≈ nearness in meaning.
The map below isn't a real UMAP or t-SNE projection of a production embedding model — those tend to be smeary and hard to read. Instead, it's a hand-placed 2D layout of 80 AI-industry words, arranged by category so the intuition is legible at a glance.
▶ Try it
Loading embedding layout…
⁂ Notes from the bench
What to watch for, why it matters, and the one thing that usually surprises people.
The four clusters
Four loose clusters. Models & labs (Claude, GPT, OpenAI, Anthropic) upper-left. Training concepts (transformer, embedding, RLHF, gradient) lower-left. Runtime/inference (API, latency, cache, GPU) upper-right. Evaluation & safety (benchmark, alignment, jailbreak) lower-right. Words like "prompt", "agent", and "retrieval" sit near the center because they bridge categories.
Click a word to see its five nearest neighbors. Click one of the neighbors to jump. The interesting move is tracing a path between distant clusters — start at H100, click to GPU, then inference, then latency, then streaming. You've walked a concept graph without knowing it.
Why RAG rides on this
In production, "nearest neighbors" is usually implemented by a vector database (Pinecone, Chroma, pgvector, Turbopuffer) that stores embeddings and returns the k closest to a query. The query might be a user's question, the k might be 5 passages from your docs, and the system prompts a language model with those passages. That's retrieval-augmented generation, and the whole thing rests on this same "close in space means related" assumption.
When RAG works, it's because a good embedding model put the right passages near your query. When it doesn't, the embedding space usually had the wrong geometry — some dimension you needed for this domain wasn't represented. "Use better embeddings" is sometimes the right answer, which is why embedding models are a competitive market in their own right.
In a line
Illustrative embedding-space tour organized by cluster (models, training, runtime, evaluation, bridge concepts). Click-to-traverse via nearest-neighbor lookups.
Other experiments
11- Exp 001
How a sentence becomes tokens
- Exp 002
Temperature and top-p, visibly
- Exp 003
What does this prompt actually cost?
- Exp 004
Tokens per second
- Exp 005
How far should the model think?
- Exp 006
Neural language vs a Markov chain
- Exp 007
What each token looks at
- Exp 009
The injection arena
- Exp 010
AI or human?
- Exp 011
Context Tetris
- Exp 012
Magnet flip