The Loop  ·  Issue 017

The Loop

A field journal of the AI frontier — for engineers who ship.

  Lab bench

Experiment №008
filed Apr 21, 2026

explainer

Filed under

  • #embeddings
  • #vectors
  • #rag
  • #semantic-search

Words in space

A hand-placed 2D layout of 80 AI-industry words. Click any to find its nearest neighbors.

  Primer

Skip if you already know the theory; the interactive is right below.

An embedding is a word (or sentence, or image) represented as a vector of maybe 1536 numbers. Words used in similar contexts end up near each other in that high-dimensional space — and that's the whole reason embeddings are useful. Every RAG pipeline, every "find me the related article" feature, every semantic search box is built on the same intuition: nearness in embedding space ≈ nearness in meaning.

The map below isn't a real UMAP or t-SNE projection of a production embedding model — those tend to be smeary and hard to read. Instead, it's a hand-placed 2D layout of 80 AI-industry words, arranged by category so the intuition is legible at a glance.

▶  Try it

Loading embedding layout…

  Notes from the bench

What to watch for, why it matters, and the one thing that usually surprises people.

The four clusters

Four loose clusters. Models & labs (Claude, GPT, OpenAI, Anthropic) upper-left. Training concepts (transformer, embedding, RLHF, gradient) lower-left. Runtime/inference (API, latency, cache, GPU) upper-right. Evaluation & safety (benchmark, alignment, jailbreak) lower-right. Words like "prompt", "agent", and "retrieval" sit near the center because they bridge categories.

Click a word to see its five nearest neighbors. Click one of the neighbors to jump. The interesting move is tracing a path between distant clusters — start at H100, click to GPU, then inference, then latency, then streaming. You've walked a concept graph without knowing it.

Why RAG rides on this

In production, "nearest neighbors" is usually implemented by a vector database (Pinecone, Chroma, pgvector, Turbopuffer) that stores embeddings and returns the k closest to a query. The query might be a user's question, the k might be 5 passages from your docs, and the system prompts a language model with those passages. That's retrieval-augmented generation, and the whole thing rests on this same "close in space means related" assumption.

When RAG works, it's because a good embedding model put the right passages near your query. When it doesn't, the embedding space usually had the wrong geometry — some dimension you needed for this domain wasn't represented. "Use better embeddings" is sometimes the right answer, which is why embedding models are a competitive market in their own right.

In a line

Illustrative embedding-space tour organized by cluster (models, training, runtime, evaluation, bridge concepts). Click-to-traverse via nearest-neighbor lookups.

Other experiments

11
  1. Exp 001

    How a sentence becomes tokens

  2. Exp 002

    Temperature and top-p, visibly

  3. Exp 003

    What does this prompt actually cost?

  4. Exp 004

    Tokens per second

  5. Exp 005

    How far should the model think?

  6. Exp 006

    Neural language vs a Markov chain

  7. Exp 007

    What each token looks at

  8. Exp 009

    The injection arena

  9. Exp 010

    AI or human?

  10. Exp 011

    Context Tetris

  11. Exp 012

    Magnet flip