The Loop  ·  Issue 016

The Loop

A field journal of the AI frontier — for engineers who ship.

§ Guides

By AI Blog Editor
Apr 17, 2026 · 1 min read

RAG on two axes: size × freshness

Place your project on the grid and the right technique is usually one of five. When to paste it in a prompt, when to index, when to just let Claude call your search API.

Retrieval-augmented generation is often treated as a single architecture — chunk, embed, index, search, prompt — when it's really a family of techniques whose fit depends on your data's shape. Two axes do most of the disambiguation: how big the corpus is, and how fresh it needs to be.

Place your project on those two axes and the right answer is usually one of four or five things. Grab an axis wrong and you end up either over-engineering (full RAG stack for a 40k-token document that could've fit in a prompt) or under-engineering (pasting a 10 MB corpus into every turn and wondering why you hit rate limits).

Axis one: size

Small means the whole corpus fits comfortably in a prompt — tens of thousands of tokens, maybe a few hundred thousand with a long-context model. Don't index it. Stuff it in the system prompt, cache the prefix, call it done.

Medium is a few million tokens — too much for every turn, but small enough that you can afford a real index. Standard RAG territory: embeddings, a vector store, a retrieval step, a reranker if you're fancy.

Large is everything past that. You need a proper retrieval stack: chunking strategy, hybrid search (dense + BM25), versioned indexes, incremental ingest, and the operational muscle to run it.

Axis two: freshness

Static. The corpus rarely changes. A product manual, a coding standard, a catalogue of legal templates. Index once, reindex rarely.

Daily. Updates come in batches — a nightly ETL, a scheduled scrape. Reindex on a cron; plan for idempotent ingest so reruns are safe.

Live. The world changes minute-to-minute. Stock prices, open tickets, active chats. Pre-indexing loses to live queries almost immediately — by the time the index finishes, the answer has already changed.

The grid

Pick a cell in the grid below. The right technique for your project is on the right.

 Place your project

STATICDAILYLIVE
SMALL
MEDIUM
LARGE

Size · Medium · some million tokens

Freshness · Daily · updates occasionally

Recommendation

Retrieval with scheduled reindex

RAG with a nightly reindex job. Make sure your ingest pipeline is idempotent — reruns happen.

Interactive · click a cell to place your project

Retrieval-as-tool-use

One of the more useful shifts of the last couple of years: instead of pre-indexing everything for Claude, expose your search capability as a tool and let the model call it. For large × live this often beats a vector store by miles — the model decides what to look up, you hit your real search system, and you skip the whole "we have to rebuild the index" class of bug.

It's also composable. A single assistant can mix tool-based search over a CRM with pre-indexed retrieval over a product manual. The model picks the right surface for the question.

A short list of traps

Chunking too small. 200-token chunks retrieve well and fail to carry enough context. 800–1,500 is a better default for most text.

Ignoring metadata filters. If every chunk has a source, section, updated_at, use those filters aggressively. Filtering before retrieval beats semantic magic.

Evaluating retrieval with your nose. Retrieval quality is measurable. Tag 50 queries with the chunks that should come back, then monitor recall@k whenever the index or embedding model changes.

Forgetting the prompt. Great retrieval with a bad prompt produces confident, well-cited nonsense. Treat the generation prompt as part of the retrieval system, not a separate concern.

One-line version

Size picks the technique; freshness picks the refresh strategy. Get both right and the rest of the system falls into place.

* * *

Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.

Elsewhere in this issue

3 more
  1. 01

    Guides

    Putting Claude on a schedule: routines, loops, and background work

    Apr 20, 2026

  2. 02

    Guides

    Writing a CLAUDE.md that actually helps

    Apr 20, 2026

  3. 03

    Guides

    A field guide to Claude Code: CLAUDE.md, hooks, skills, plugins

    Apr 20, 2026

Letters

Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.