§ Guides
By AI Blog Editor
Apr 17, 2026 · 1 min read
RAG on two axes: size × freshness
Place your project on the grid and the right technique is usually one of five. When to paste it in a prompt, when to index, when to just let Claude call your search API.
Retrieval-augmented generation is often treated as a single architecture — chunk, embed, index, search, prompt — when it's really a family of techniques whose fit depends on your data's shape. Two axes do most of the disambiguation: how big the corpus is, and how fresh it needs to be.
Place your project on those two axes and the right answer is usually one of four or five things. Grab an axis wrong and you end up either over-engineering (full RAG stack for a 40k-token document that could've fit in a prompt) or under-engineering (pasting a 10 MB corpus into every turn and wondering why you hit rate limits).
Axis one: size
Small means the whole corpus fits comfortably in a prompt — tens of thousands of tokens, maybe a few hundred thousand with a long-context model. Don't index it. Stuff it in the system prompt, cache the prefix, call it done.
Medium is a few million tokens — too much for every turn, but small enough that you can afford a real index. Standard RAG territory: embeddings, a vector store, a retrieval step, a reranker if you're fancy.
Large is everything past that. You need a proper retrieval stack: chunking strategy, hybrid search (dense + BM25), versioned indexes, incremental ingest, and the operational muscle to run it.
Axis two: freshness
Static. The corpus rarely changes. A product manual, a coding standard, a catalogue of legal templates. Index once, reindex rarely.
Daily. Updates come in batches — a nightly ETL, a scheduled scrape. Reindex on a cron; plan for idempotent ingest so reruns are safe.
Live. The world changes minute-to-minute. Stock prices, open tickets, active chats. Pre-indexing loses to live queries almost immediately — by the time the index finishes, the answer has already changed.
The grid
Pick a cell in the grid below. The right technique for your project is on the right.
◆ Place your project
| STATIC | DAILY | LIVE | |
|---|---|---|---|
| SMALL | |||
| MEDIUM | |||
| LARGE |
Size · Medium · some million tokens
Freshness · Daily · updates occasionally
Recommendation
Retrieval with scheduled reindex
RAG with a nightly reindex job. Make sure your ingest pipeline is idempotent — reruns happen.
Retrieval-as-tool-use
One of the more useful shifts of the last couple of years: instead of pre-indexing everything for Claude, expose your search capability as a tool and let the model call it. For large × live this often beats a vector store by miles — the model decides what to look up, you hit your real search system, and you skip the whole "we have to rebuild the index" class of bug.
It's also composable. A single assistant can mix tool-based search over a CRM with pre-indexed retrieval over a product manual. The model picks the right surface for the question.
A short list of traps
Chunking too small. 200-token chunks retrieve well and fail to carry enough context. 800–1,500 is a better default for most text.
Ignoring metadata filters. If every chunk has a source, section, updated_at, use those filters aggressively. Filtering before retrieval beats semantic magic.
Evaluating retrieval with your nose. Retrieval quality is measurable. Tag 50 queries with the chunks that should come back, then monitor recall@k whenever the index or embedding model changes.
Forgetting the prompt. Great retrieval with a bad prompt produces confident, well-cited nonsense. Treat the generation prompt as part of the retrieval system, not a separate concern.
One-line version
Size picks the technique; freshness picks the refresh strategy. Get both right and the rest of the system falls into place.
* * *
Thanks for reading. If a line here was useful — or plainly wrong — the comments are below and the newsletter has your back.
Elsewhere in this issue
3 more- 01
News
The first partner cut — days before Amazon's researchers flagged a Fable 5 vulnerability, the White House had already told Anthropic to revoke access for SK Telecom, its earliest Korean shareholder and a Project Glasswing partner, over concerns about the company's alleged ties to China. Five days later, Anthropic opened a Seoul office and signed every major Korean conglomerate that isn't SK.
Jun 19, 2026
- 02
The Patch
The Patch — June 19, 2026
Jun 19, 2026
- 03
News
The kill switch did the diplomacy — five days after Washington took Anthropic Fable 5 and Mythos 5 offline, Dario Amodei and Demis Hassabis sat down at the G7 in Évian-les-Bains and asked the allies to sign up for an explicitly US-led AI coalition. Canada said yes; France brought a list.
Jun 18, 2026
Letters
Arguments, corrections, questions. Anonymous comments allowed; be kind, be specific.