Keeping RAG Honest When Your Documents Change
Your RAG demo was perfect. Then someone edited a doc, deleted a page, and shipped a new policy — and your assistant kept citing the old one. Here's how to build a retrieval layer that doesn't quietly rot.
The demo was flawless. We dropped a folder of policy PDFs into a knowledge base, wired up retrieval, and the assistant answered everything with crisp citations. Two weeks later, support escalated a ticket: the bot had confidently quoted a refund window that legal had changed a month earlier. Nobody touched the code. The documents had moved on without us.
This is the failure almost nobody teaches. Every RAG tutorial ends at “…and then it answers from your documents.” But documents are not a fixed thing you index once. They're alive — edited, versioned, deprecated, deleted, reorganized. The moment your corpus changes and your index doesn't, your beautifully grounded assistant starts grounding itself in the past.
If you only remember one sentence from this post, make it this one: a RAG system is only as fresh as its index, and your index does not update itself. Everything below is about closing the gap between what your documents say now and what your vector store thinks they say.
The quiet failure mode
Most production incidents are loud — a 500, a stack trace, a pager. Stale retrieval is the opposite. Nothing throws. The pipeline runs, the vector search returns chunks, the model writes a fluent, well-cited answer. It's just wrong, because the chunk it cited describes a world that no longer exists. The system is behaving exactly as designed; the design simply assumed the documents would hold still.
Staleness has no error signal. Your dashboards stay green, latency is fine, costs look normal. The only symptom is a slow erosion of answer quality that you won't notice until a user — or worse, a customer — does.
Documents change in more ways than you think
“The docs changed” hides at least four distinct events, and each one needs a different response from your indexing layer:
- Edits — a paragraph is rewritten, a number is updated. The document's identity is the same; its content isn't. You need to re-chunk and re-embed the affected parts.
- New versions — v2 of a contract supersedes v1, but v1 may still be legally relevant. Now you have a versioning problem, not just a freshness one.
- Deletions — a page is removed or a product is sunset. Its chunks must leave the index, or your assistant will keep citing a ghost.
- Reorganizations — content is split, merged, or moved between files. Chunk boundaries shift, IDs you relied on disappear, and naïve diffing sees the whole corpus as “new.”
Deletions are the one teams forget. Adding fresh content feels like progress, so re-ingestion pipelines tend to upsert and call it a day. But a vector store that only ever grows is a vector store that never forgets — and in retrieval, a confidently-returned deleted chunk is indistinguishable from a current one.
Step one: detect what actually changed
Re-embedding your entire corpus on every change is simple and, for a few thousand documents, perfectly fine. It stops being fine the moment you have millions of chunks and an embedding bill to match. The scalable move is to only touch what changed — which means you need a cheap, reliable way to know what changed.
The workhorse here is content hashing. For every chunk (or every document, then every chunk), compute a stable hash of its normalized text. Store that hash alongside the vector as metadata. On the next ingestion run, hash the incoming content and compare:
Only B gets re-embedded; A is skipped (free); C is removed from the index. That's the whole cost-saving idea.
// A minimal change-detection pass over one document's chunks.
import { createHash } from "node:crypto";
const hash = (text: string) =>
createHash("sha256").update(text.trim().replace(/\s+/g, " ")).digest("hex");
async function reconcile(docId: string, freshChunks: string[]) {
// What's currently indexed for this document?
const existing = await store.list({ filter: { docId } }); // [{ id, contentHash }]
const existingByHash = new Map(existing.map((c) => [c.contentHash, c]));
const seen = new Set<string>();
for (const text of freshChunks) {
const h = hash(text);
seen.add(h);
if (existingByHash.has(h)) continue; // unchanged → skip (no re-embed)
const vector = await embed(text); // changed or new → embed
await store.upsert({ id: `${docId}:${h}`, vector, text, contentHash: h, docId });
}
// Anything indexed but no longer present in the source was deleted.
for (const c of existing) {
if (!seen.has(c.contentHash)) await store.delete(c.id); // tombstone
}
}Whitespace, smart quotes, and trailing newlines will wreck your diff — every chunk will look “changed” after a harmless reformat. Normalize aggressively (collapse whitespace, standardize quotes) so the hash reflects meaning, not formatting noise.
Keep a small ingestion manifest per source: the document's own version or last-modified timestamp, plus the set of chunk hashes you produced. On the next run you can skip untouched documents entirely before you even chunk them, and you have an audit trail of exactly what the index believed at any point in time.
Step two: choose a re-indexing strategy
Once you know what changed, you have to decide how to apply it. There's no single right answer — it's a trade between simplicity, cost, and how much you can tolerate a half-updated index serving live traffic.
Cheapest. Re-embeds the diff, deletes the gone. Pair with periodic full rebuilds.
- Full rebuild — re-chunk and re-embed everything from scratch into a clean index. Dead simple, immune to drift, and easy to reason about. It's also the most expensive and slowest, so it works best on small corpora or on a nightly cadence.
- Incremental — use your hash diff to re-embed only changed and new chunks, and delete the gone ones, in place. Cheap and fast. The catch: while it runs, your index is momentarily inconsistent (some chunks updated, some not), which can produce briefly weird answers.
- Versioned / blue-green — build the updated index beside the live one, validate it, then flip traffic over atomically. The gold standard for anything user-facing.
For most teams the pragmatic path is incremental updates for routine edits, with a periodic full rebuild as a safety net to wash out any drift, fragmentation, or chunking-logic changes that incremental updates can accumulate over time.
Versioned indexes: never serve a half-built index
The single highest-leverage practice for a serious RAG system is to treat your index like you treat application deploys: immutable, versioned, and swapped atomically. You don't edit production in place while users are hitting it — you build the new version, run it through checks, and cut over.
The app always queries the stable alias. Flipping it is atomic — and instantly reversible.
Most managed vector stores support this directly through aliases or namespaces: your application queries a stable name (say, kb-current) that points at a concrete underlying index (kb-2026-05-22). Re-indexing builds a new concrete index, you validate it, then you re-point the alias. Rollback is just pointing it back. No user ever sees a partially-updated state.
Every chunk should travel with its source id, document version, last-updated timestamp, and content hash. This is what makes incremental diffing, deletions, version filtering (“only answer from the current contract”), and debugging a bad answer possible. Thin metadata is the root cause of most “why did it retrieve that?” mysteries.
The chunk that lost its context
Even with a perfectly fresh index, there's a subtler failure that gets worse as documents grow and change: a chunk, ripped out of its document and embedded on its own, often loses the context that made it meaningful. A sentence like “The figure rose 18% in this period” is useless in isolation — which figure, which period, which company?
Contextual embeddings (popularized by Anthropic's contextual retrieval work) fix this cheaply: before embedding a chunk, prepend a short, document-aware blurb that situates it. You generate that blurb once per chunk with a fast, cheap model — and because the surrounding document rarely changes when a single chunk does, you can cache it and only regenerate context for chunks whose neighborhood actually moved.
// Prepend a short, generated context before embedding each chunk.
const context = await llm.complete({
model: "fast-cheap-model",
prompt: `Document: ${docTitle}
Here is a chunk from it:
"""${chunk}"""
In one sentence, situate this chunk within the document so it stands alone.`,
});
const enriched = `${context}\n\n${chunk}`;
const vector = await embed(enriched); // embed the context + chunk together
await store.upsert({ id, vector, text: chunk, context, contentHash: hash(chunk) });Contextual embeddings raise recall, but exact terms (error codes, SKUs, names) still belong to keyword search. Blending dense vectors with a classic keyword index (BM25) and a reranker on top is the most reliable retrieval stack we know of — and it's resilient to the wording drift that comes with edited docs.
You can't fix what you can't see
Because staleness is silent, you have to go looking for it. Treat retrieval like any other production system and instrument it:
- Log every retrieval — the query, the chunks returned, their scores, their source ids and versions. When an answer is wrong, you want to replay exactly what the model saw.
- Track which chunks get cited — chunks that are retrieved but never useful are noise; chunks that are cited constantly are load-bearing and deserve extra care when their source changes.
- Watch the freshness gap — alert when a source's last-modified time is newer than the index's last-ingested time for that source. That single metric catches most staleness before a user does.
- Sample and review — periodically pull real queries and eyeball the retrieved context. Drift hides in the long tail.
Re-indexing without an eval is a coin flip
Here's the trap: re-indexing feels safe, so teams ship it blind. But a chunking tweak, a new embedding model, or a botched deletion can quietly tank retrieval quality — and you've now baked that regression into your fresh, confident-looking index.
The fix is to gate every re-index behind an evaluation, exactly like you'd gate a code deploy behind tests. Maintain a golden set — a few dozen representative questions with known-good answers and the chunks that should be retrieved. Run it against the candidate index before you flip the alias. If retrieval recall or answer faithfulness drops, the new index doesn't ship.
Detect change → re-embed only what moved → build a new versioned index → run the golden-set eval → flip the alias if it passes → keep the freshness metric green. None of these steps is exotic. The teams whose RAG stays trustworthy are simply the ones who made this loop automatic and unglamorous.
A practical playbook
- 1Attach rich metadata to every chunk from day one: source id, version, updated_at, and content_hash. You can't add this retroactively without a full rebuild.
- 2Normalize text, then hash each chunk. Diff against the index to find edits, additions, and — critically — deletions.
- 3Re-embed only what changed; tombstone what's gone. Keep a periodic full rebuild as a drift-washing safety net.
- 4Build re-indexes into a new versioned index; never mutate the live one in place.
- 5Gate the cutover on a golden-set eval. Flip the alias only if quality holds.
- 6Use contextual embeddings + hybrid search + a reranker so retrieval survives wording drift.
- 7Instrument retrieval and alert on the freshness gap. Silence is not success.
Where this lands in AgentSwarms
We built the Knowledge Base in AgentSwarms with exactly these failure modes in mind. Documents are chunked and embedded for you, chunk inserts are idempotent (so a re-ingestion run won't duplicate or corrupt your index), and the UI surfaces an embedding_failed status so a silent half-indexed document doesn't slip by. You can feel the whole retrieval loop end-to-end — including what a broken one looks like — in the Failure-Mode Labs.
AgentSwarms is a learning and prototyping platform, not a production RAG runtime. The point of this post isn't to sell you our index — it's to give you the mental model and the playbook so that whatever you run in production stays honest as your documents change.
Your documents will keep changing. That's not a bug in your knowledge base — it's the whole reason it exists. Build the loop that keeps up with them, and your assistant stops being a snapshot of last month and starts being a reliable window into what's true right now.
Further reading & references
Was this useful?
Comments
Loading comments…