Why your RAG pipeline serves last month’s data

RAG · IBM Fusion CAS

Why your RAG pipeline serves last month's data.

Re-vectorizing thousands of documents every time something changes doesn't scale. IBM Fusion CAS integrates vectorization directly into storage: documents change, vectors update themselves.

7 min readRAG · Storage · Unstructured data

IBM Fusion CAS (Content Aware Storage) is a capability built into IBM Fusion that vectorizes, indexes, and keeps documents continuously updated directly in the storage layer — without moving data or rebuilding the vector index.

If you have a RAG pipeline in production, you've probably run into this: documents change, but vectors don't. The contract was amended in March, the chatbot still answers with the December version. It's not a model problem — it's that nobody re-ran the ingestion pipeline. CAS solves exactly that.

80–90%
Of enterprise data is
unstructured
Source: IBM Redbooks
40%
AI prototypes never
reach production
Due to data quality
0
Data copies required
with CAS
Zero-copy ingestion
01 · The problem

Why do vector embeddings in a RAG pipeline go stale?

Between 80% and 90% of enterprise data is unstructured — PDFs, scanned documents, spreadsheets, contracts, support tickets. In a conventional RAG pipeline, the flow to make them accessible to AI is: extract documents → parse → generate embeddings → load into a vector database → search when a query arrives. It works. Until the documents change.

Versioned technical manuals, contracts with addenda, quarterly financial reports, support tickets that get reopened. Every time something changes, you have to re-run the entire pipeline. With thousands of documents, that means hours of GPU time, massive data movement between systems, and a team babysitting the process. According to IBM, 40% of AI prototypes never reach production precisely because of data quality and availability issues.

The usual alternative is to skip re-vectorization. And then your AI answers with two-month-old information.

The security gap nobody sees

In most RAG deployments, vectorization strips out the access controls from original documents. The chatbot has access to the entire vector index, and suddenly a sales rep can extract financial information they shouldn't see because the file's ACLs weren't propagated to the vectors. CAS solves this: vectors inherit the permissions from the source document.

02 · The solution

What is IBM Fusion CAS and what does it do?

CAS (Content Aware Storage) is a capability built into IBM Fusion that operates on top of Storage Scale. It's not a separate product. Storage goes from being a place where bytes are kept to understanding what's inside each file: its structure, its semantics, and how it has changed since it was last processed.

AI-Q Research Assistant architecture with IBM Fusion CAS — ingestion, vectorization, and RAG query flow
AI-Q Research Assistant architecture on IBM Fusion — Source: IBM Community, Sandeep Zende
Capability Traditional RAG pipeline IBM Fusion CAS
Data movement
Copy to external system
Zero-copy in place
Vector updates
Full re-ingestion
Automatic incremental
Change detection
Manual / cron
Real-time
Access control on vectors
Not propagated
ACLs inherited
GPU acceleration
Inference only
From ingestion
Orchestration
Scripts + crons + queues
Built into storage

If you already use Docling (or LibrePower's port for IBM Power) with Milvus and an LLM, you don't need CAS for that to work. A deployment with a few hundred PDFs that rarely change is well served by an orchestrated pipeline and a cron. The tipping point comes when documents number in the tens of thousands, change daily, and access control matters.

03 · How it works

How does CAS process documents without moving them out of storage?

IBM Fusion CAS flow — ingestion and query
📄
Document lands or changes in Storage Scale PDFs, scans, tables, contracts — CAS detects the event automatically
GPU-accelerated extraction and semantic chunking OCR, table recognition, layout analysis — all in storage, no copies
🧬
Embedding generation with NeMo Retriever Vectorization on NVIDIA Blackwell GPUs — RTX PRO 6000, linear scaling
🗄️
Incremental indexing in integrated vector database Only what changed gets updated — with inherited ACLs from source document
🔁
RAG query: retrieve → reason → refine → respond AI-Q Research Assistant: iterative loop with Nemotron + Llama-3, not a single-shot answer
↻ Continuous loop — data is automatically re-processed when it changes

The key difference from a conventional pipeline: there is no manual step between "the document changed" and "the vector index reflects that change." CAS closes that gap automatically, with NVIDIA Blackwell GPUs accelerating every phase — not just final inference. Ingestion and query throughput scales linearly as more NVIDIA RTX PRO 6000 GPUs are added, as documented in the IBM Redbook on NVIDIA AI Data Platform. On BEIR benchmarks (the industry standard for evaluating semantic search), CAS outperforms the most advanced retrieval systems on the market.

04 · Deployment

On-premises because there is no alternative

The entire architecture runs on-premises. This isn't a preference: if your data falls under GDPR, the EU AI Act, EBA banking regulations, or classified information requirements, sending it to a cloud API for vectorization is not a legal option.

It's the same philosophy we described when talking about building an on-premises AI factory with Ceph and Kubernetes, with one difference: CAS integrates data preparation directly into storage. No separate processing cluster to orchestrate, no message queues between NAS and pipeline, no temporary S3 buckets.

Storage Scale vs Ceph: a new argument

If you're evaluating which storage you need for AI workloads — the decision between Storage Scale and Ceph we covered last week — CAS tips the scale. It's something that only exists in the Storage Scale / Fusion ecosystem and has no direct equivalent in Ceph or any other distributed file system today.

05 · Scope

When does CAS make sense over a hand-built RAG pipeline?

CAS requires IBM Fusion on OpenShift. It's not a component you plug into any infrastructure. If your RAG works fine with Docling + Milvus + a cron job, you don't need this.

It makes sense when several of these conditions apply at once:

  • High volume of unstructured documents that change frequently.
  • Granular access control requirements — healthcare, banking, public administration, legal.
  • Existing or planned IBM infrastructure (Fusion, Storage Scale).
  • Need for the vector index to stay current without manual intervention.
  • Data sovereignty and European regulatory compliance.

On-premises RAG architecture

Need to size an AI architecture on Fusion?

At SIXE we work with IBM Fusion, Storage Scale, and RAG pipelines in production. Tell us about your use case and we'll help you design the solution.

SIXE