What is IBM Fusion CAS (Content Aware Storage)?

CAS is a capability built into IBM Fusion that turns storage into an active system: it extracts document content, generates vector embeddings, and keeps the index continuously updated through incremental processing — without moving data out of the storage layer.

What is the difference between a traditional RAG pipeline and CAS?

In a traditional RAG pipeline, data is copied from storage to an external processing system for vectorization. CAS integrates vectorization directly into the storage layer: zero-copy ingestion, incremental updates when documents change, and automatic inheritance of access control permissions on vectors.

Does CAS replace Docling and Milvus?

It depends on scale. For deployments with a few hundred documents that rarely change, a pipeline with Docling + Milvus works well. CAS is designed for environments with tens of thousands of documents that evolve daily, where maintaining a hand-built pipeline becomes a project in itself.

Does IBM Fusion CAS run on-premises?

Yes, the entire architecture runs on-premises: storage, GPUs, vectorization, and inference all within the organization's perimeter. Designed for GDPR, EU AI Act, and sector-specific regulation (healthcare, banking, public administration).

RAG · IBM Fusion CAS

Why your RAG pipeline serves last month's data.

Re-vectorizing thousands of documents every time something changes doesn't scale. IBM Fusion CAS integrates vectorization directly into storage: documents change, vectors update themselves.

7 min read●RAG · Storage · Unstructured data

IBM Fusion CAS (Content Aware Storage) is a capability built into IBM Fusion that vectorizes, indexes, and keeps documents continuously updated directly in the storage layer — without moving data or rebuilding the vector index.

If you have a RAG pipeline in production, you've probably run into this: documents change, but vectors don't. The contract was amended in March, the chatbot still answers with the December version. It's not a model problem — it's that nobody re-ran the ingestion pipeline. CAS solves exactly that.

80–90%

Of enterprise data is
unstructured
Source: IBM Redbooks

40%

AI prototypes never
reach production
Due to data quality

Data copies required
with CAS
Zero-copy ingestion

01 · The problem

Why do vector embeddings in a RAG pipeline go stale?

Between 80% and 90% of enterprise data is unstructured — PDFs, scanned documents, spreadsheets, contracts, support tickets. In a conventional RAG pipeline, the flow to make them accessible to AI is: extract documents → parse → generate embeddings → load into a vector database → search when a query arrives. It works. Until the documents change.

Versioned technical manuals, contracts with addenda, quarterly financial reports, support tickets that get reopened. Every time something changes, you have to re-run the entire pipeline. With thousands of documents, that means hours of GPU time, massive data movement between systems, and a team babysitting the process. According to IBM, 40% of AI prototypes never reach production precisely because of data quality and availability issues.

The usual alternative is to skip re-vectorization. And then your AI answers with two-month-old information.

The security gap nobody sees

In most RAG deployments, vectorization strips out the access controls from original documents. The chatbot has access to the entire vector index, and suddenly a sales rep can extract financial information they shouldn't see because the file's ACLs weren't propagated to the vectors. CAS solves this: vectors inherit the permissions from the source document.

02 · The solution

What is IBM Fusion CAS and what does it do?

CAS (Content Aware Storage) is a capability built into IBM Fusion that operates on top of Storage Scale. It's not a separate product. Storage goes from being a place where bytes are kept to understanding what's inside each file: its structure, its semantics, and how it has changed since it was last processed.

AI-Q Research Assistant architecture with IBM Fusion CAS — ingestion, vectorization, and RAG query flow

AI-Q Research Assistant architecture on IBM Fusion — Source: IBM Community, Sandeep Zende

Capability Traditional RAG pipeline IBM Fusion CAS

Data movement

Copy to external system

Zero-copy in place

Vector updates

Full re-ingestion

Automatic incremental

Change detection

Manual / cron

Real-time

Access control on vectors

Not propagated

ACLs inherited

GPU acceleration

Inference only

From ingestion

Orchestration

Scripts + crons + queues

Built into storage

If you already use Docling (or LibrePower's port for IBM Power) with Milvus and an LLM, you don't need CAS for that to work. A deployment with a few hundred PDFs that rarely change is well served by an orchestrated pipeline and a cron. The tipping point comes when documents number in the tens of thousands, change daily, and access control matters.

03 · How it works

How does CAS process documents without moving them out of storage?

IBM Fusion CAS flow — ingestion and query

📄

Document lands or changes in Storage Scale PDFs, scans, tables, contracts — CAS detects the event automatically

⚡

GPU-accelerated extraction and semantic chunking OCR, table recognition, layout analysis — all in storage, no copies

🧬

Embedding generation with NeMo Retriever Vectorization on NVIDIA Blackwell GPUs — RTX PRO 6000, linear scaling

🗄️

Incremental indexing in integrated vector database Only what changed gets updated — with inherited ACLs from source document

🔁

RAG query: retrieve → reason → refine → respond AI-Q Research Assistant: iterative loop with Nemotron + Llama-3, not a single-shot answer

↻ Continuous loop — data is automatically re-processed when it changes

The key difference from a conventional pipeline: there is no manual step between "the document changed" and "the vector index reflects that change." CAS closes that gap automatically, with NVIDIA Blackwell GPUs accelerating every phase — not just final inference. Ingestion and query throughput scales linearly as more NVIDIA RTX PRO 6000 GPUs are added, as documented in the IBM Redbook on NVIDIA AI Data Platform. On BEIR benchmarks (the industry standard for evaluating semantic search), CAS outperforms the most advanced retrieval systems on the market.

04 · Deployment

On-premises because there is no alternative

The entire architecture runs on-premises. This isn't a preference: if your data falls under GDPR, the EU AI Act, EBA banking regulations, or classified information requirements, sending it to a cloud API for vectorization is not a legal option.

It's the same philosophy we described when talking about building an on-premises AI factory with Ceph and Kubernetes, with one difference: CAS integrates data preparation directly into storage. No separate processing cluster to orchestrate, no message queues between NAS and pipeline, no temporary S3 buckets.

Storage Scale vs Ceph: a new argument

If you're evaluating which storage you need for AI workloads — the decision between Storage Scale and Ceph we covered last week — CAS tips the scale. It's something that only exists in the Storage Scale / Fusion ecosystem and has no direct equivalent in Ceph or any other distributed file system today.

05 · Scope

When does CAS make sense over a hand-built RAG pipeline?

CAS requires IBM Fusion on OpenShift. It's not a component you plug into any infrastructure. If your RAG works fine with Docling + Milvus + a cron job, you don't need this.

It makes sense when several of these conditions apply at once:

High volume of unstructured documents that change frequently.
Granular access control requirements — healthcare, banking, public administration, legal.
Existing or planned IBM infrastructure (Fusion, Storage Scale).
Need for the vector index to stay current without manual intervention.
Data sovereignty and European regulatory compliance.

Technical references

From RAG to Real Research — Sandeep Zende, IBM Community

IBM Redbooks: AI Inference at Scale with Storage Scale ECE and Fusion CAS

Official IBM Fusion CAS documentation

On-premises RAG architecture

Need to size an AI architecture on Fusion?

At SIXE we work with IBM Fusion, Storage Scale, and RAG pipelines in production. Tell us about your use case and we'll help you design the solution.

Contact SIXE +34 91 198 02 43

Why your RAG pipeline serves last month’s data

Why your RAG pipeline serves last month's data.

Why do vector embeddings in a RAG pipeline go stale?

What is IBM Fusion CAS and what does it do?

How does CAS process documents without moving them out of storage?

On-premises because there is no alternative

When does CAS make sense over a hand-built RAG pipeline?

Need to size an AI architecture on Fusion?

Blog!

Contact us!

Partners

Our mission