¿Qué es Content-Aware Storage (CAS) de IBM?

CAS es una capacidad de IBM Storage Scale que convierte el almacenamiento en una capa activa de procesamiento de datos para IA. Realiza chunking semántico, vectorización e indexación directamente en la capa de storage, eliminando la necesidad de mover datos a sistemas externos para preparar pipelines de RAG.

¿Qué GPUs NVIDIA soporta IBM Fusion HCI?

Fusion HCI soporta servidores GPU con NVIDIA H200 y RTX Pro 6000 Blackwell Edition. Cada sistema admite hasta 4 servidores GPU con 8 tarjetas cada uno, un total de 32 GPUs.

¿Qué es NVIDIA AI Data Platform (AIDP)?

AIDP es un diseño de referencia de NVIDIA que integra compute acelerado, networking y software de IA con almacenamiento empresarial. IBM Fusion actúa como la base Kubernetes (OpenShift) y Storage Scale ECE proporciona el acceso paralelo multi-GPU a datos.

IBM Storage · NVIDIA · AI

IBM Fusion & NVIDIA Blackwell: storage now processes data for AI.

GTC 2026 brought an IBM-NVIDIA collaboration far deeper than it appears. Fusion is no longer just storage for containers: with Content-Aware Storage and Blackwell GPUs, storage becomes an active AI data preparation engine — the critical layer for enterprise RAG at scale.

8 min read●Storage · AI · Infrastructure

On 16 March, IBM took the stage at GTC 2026 in San José with an announcement that passed largely unnoticed outside storage circles: an expanded collaboration with NVIDIA spanning Blackwell Ultra GPUs in IBM Cloud, GPU-native data analytics, intelligent document processing, and on-premises deployments for regulated industries.

Three weeks later, IBM published a technical Redbook detailing how to integrate Storage Scale, Fusion and Content-Aware Storage (CAS) with the NVIDIA AI Data Platform. And recently, IBM, NVIDIA and Samsung demonstrated a CAS system capable of managing 100 billion vectors on a single server — the kind of scale that breaks traditional RAG pipelines.

What does this actually mean in practice? Is it a real architectural shift or keynote marketing? Here's our analysis.

The announcement

GTC 2026: IBM and NVIDIA get serious about enterprise AI

What IBM announced at GTC is not a generic partnership. These are five concrete workstreams that directly affect how enterprises deploy AI on-premises — and all of them connect back to storage for AI:

NVIDIA Blackwell Ultra GPUs on IBM Cloud — available from Q2 2026 for large-scale training, high-throughput inference and AI reasoning.
Content-Aware Storage (CAS) integrated into the next Fusion release — storage stops being passive and starts processing data for AI.
Red Hat AI Factory with NVIDIA — OpenShift + NVIDIA GPUs as the standardised platform for deploying AI in production.
IBM Consulting + NVIDIA Blueprints — integration services to move AI from pilot to production.
NVIDIA AI Data Platform (AIDP) support — a reference design integrating compute, networking and storage into a unified AI system.

Fuente: IBM Newsroom, 16 marzo 2026

The most impactful data point for on-premises infrastructure: Fusion HCI already includes GPU servers with NVIDIA H200 and RTX Pro 6000 Blackwell Edition. This is not a roadmap — the hardware is available today. Each system supports up to 4 GPU servers with 8 cards each.

To understand how all the pieces fit together, here is the full stack IBM has defined as the AIDP reference architecture on Fusion:

Reference stack · Source: IBM Redbook MD248598

Rob Davis, VP of Storage Networking Technology at NVIDIA, was direct: AI agents need to access, retrieve and process data at scale, and today those steps happen in separate silos. The integration of CAS with NVIDIA orchestrates data and compute across an optimised network fabric to overcome those silos.

The technology

Content-Aware Storage: when storage understands what it holds

This is the most interesting part of the announcement and the least covered. Until now, enterprise storage was a passive repository: it stored files and served them on request. To run RAG (Retrieval-Augmented Generation) or feed AI models with corporate data, you needed a separate pipeline that extracted documents, chunked them, vectorised them and pushed them into a vector database.

CAS eliminates that external pipeline. It operates in two phases — visualised below:

CAS-RAG flow · Animated arrows indicate continuous processing

Phase 1: Continuous ingestion and preparation

CAS monitors folders in Storage Scale (or external storage via AFM) and detects changes in real time. When a document is modified or added, CAS processes it automatically: content extraction from text, tables, charts and images using NVIDIA NeMo Retriever, semantic chunking, and conversion into high-dimensional embeddings. Vectors are indexed in a CAS-managed vector database on Storage Scale ECE.

Phase 2: Query and retrieval

When a user or AI agent asks a question, CAS performs semantic search, keyword (BM25) or hybrid retrieval. Results pass through an NVIDIA-optimised reranker for maximum relevance. Critically: vectors inherit the access controls (ACLs) from the original documents. If a user cannot read a file, they cannot see its vectors in RAG results either.

Fuente: IBM Redbook MD248598 — Enabling AI Inference at Scale, abril 2026

Why this matters

Most enterprise RAG deployments fail at two points: data goes stale because nobody updates the vector database, and there is no access control on the vectors. CAS solves both problems at the infrastructure layer, not the application layer. That is a genuine paradigm shift.

IBM + NVIDIA + Samsung demo

100mil millones

vectors on a single server with decoupled compute and storage, GPU-accelerated hierarchical indexing. At that scale, traditional RAG indices become unmanageable.

Fuente: SDxCentral, abril 2026

The hardware

H200, RTX Pro 6000 and Blackwell Ultra: which GPU goes where

There are three NVIDIA GPU lines in the IBM ecosystem that are worth keeping straight. Each has a distinct role — click each tab to see where it deploys and what it's for:

Blackwell Ultra H200 RTX Pro 6000

NVIDIA Blackwell Ultra

GTC 2026 · Cloud-first

IBM Cloud

AvailabilityIBM Cloud · Q2 2026

Use caseLarge-scale training, high-throughput inference, AI reasoning

DeploymentCloud only · no on-prem option in Fusion

IntegrationRed Hat AI Factory + VPC servers with compliance controls

If your workload can go to cloud and you have no data residency restrictions, Blackwell Ultra on IBM Cloud is the most powerful option in the catalogue. But if your data cannot leave the perimeter, check the other two tabs.

NVIDIA H200

Hopper · Extended HBM3e memory

Fusion HCI on-prem

AvailabilityFusion HCI · May 2026

Use caseTraining, fine-tuning and heavy LLM inference

Memory141 GB HBM3e · 4.8 TB/s bandwidth

Configuration2 GPUs per server · Up to 4 servers per rack

Maximum total32 GPUs per Fusion system

The H200 is the option for serious on-premises training. If you've read our article on vLLM inference on IBM Power, this is the x86 equivalent for Fusion HCI. Its extended HBM3e memory versus the H100 makes it ideal for large models that previously required aggressive sharding. In Fusion HCI it accesses Storage Scale ECE directly over a 200 GbE fabric.

NVIDIA RTX Pro 6000

Blackwell Edition · Inference + visualisation

Fusion + AIDP

AvailabilityFusion HCI · May 2026

Use caseInference, RAG, CAS vectorisation, professional visualisation

ArchitectureBlackwell Server Edition · 96 GB GDDR7

Configuration2 GPUs per server · Up to 4 servers per rack

AIDP stack+ BlueField-3 DPU · ConnectX-7/8 SuperNICs

The RTX Pro 6000 Blackwell is the GPU in the AIDP reference stack. It accelerates CAS semantic chunking and vectorisation, and combined with the BlueField-3 DPU it offloads network and storage processing from the main CPU. It is the critical piece for production CAS-RAG.

Fuente: IBM Redbook MD248598 — Reference AIDP stack

What is not obvious

BlueField-3 is not just a fast NIC. It is a DPU (Data Processing Unit) that offloads network, storage and security operations from the main CPU. In an AIDP system, the BlueField-3s accelerate communication between Storage Scale and the GPUs, reducing data access latency for real-time inference. It is a critical piece that does not appear in keynotes but makes the difference in real-world performance.

The analysis

What this means for on-premises AI

Putting all the pieces together, the IBM message is clear: Fusion is no longer a container storage product. It is an on-premises AI platform integrating compute (OpenShift), acceleration (NVIDIA GPUs), intelligent storage (Storage Scale + CAS) and optimised networking (Spectrum-X + BlueField-3) in a unified appliance.

For organisations that cannot — or choose not to — send their data to the cloud, this is significant. Especially in three scenarios:

Regulated industries

Banking, healthcare, public sector. Data cannot leave the perimeter. With Fusion HCI + CAS + NVIDIA GPUs you can run corporate RAG on internal documents without anything leaving the rack. And ACLs are enforced at the vector level — compliance built-in, not bolted-on.

AI on proprietary data at scale

IBM estimates 80-90% of enterprise data is unstructured. CAS converts that volume into AI-consumable data continuously and automatically. This is not a one-off ETL project — it is a permanent infrastructure capability.

Alternative to cloud when TCO does not add up

IBM keeps repeating the figure of Databricks-equivalent performance at 60% of the cost. This is an internal benchmark on selected operations, so it deserves some scepticism. But the economic logic of on-premises for predictable, high-volume workloads remains solid. If you know you'll have 30 GPUs running 24/7, on-premises TCO usually wins.

Our take

Real or marketing?

A bit of both, as always. What is unambiguously real:

The hardware existe y se puede comprar. Las H200 y RTX Pro 6000 están disponibles como servidores GPU para Fusion HCI. No es un roadmap.
CAS works. The 100-billion-vector demo is verifiable. The Redbook details the architecture step by step.
NVIDIA AIDP is a real reference design with early adoption in healthcare (UT Southwestern Medical Center) and finance.
Red Hat AI Factory standardises OpenShift + GPU deployment as an AI platform — exactly what Fusion HCI delivers as an appliance.

What deserves some nuance:

CAS is not yet in Fusion GA. IBM said Q2 2025, then Q2 2026. It's been integrated in Storage Scale since March 2025, but the embedded Fusion version is still landing.
The 60% cost vs Databricks figure is an internal benchmark under controlled conditions. In real production, the benefit will depend on your workload.
Fusion HCI is not cheap. A rack with H200 GPUs, 16 storage nodes and OpenShift licences is a significant investment. It makes sense for organisations with sensitive data and predictable workloads — not for an AI pilot.

SIXE take

The most significant part of this wave is not the GPUs — everyone has those. It is CAS. Storage that semantically understands what it holds and maintains a real-time vector database with inherited ACLs is a genuine architectural shift. If it works as promised (and the demos suggest it does), it resolves the two main problems with enterprise RAG: data freshness and access security.

That said, not everyone needs Fusion HCI to benefit. CAS lives in Storage Scale, which can also be deployed as software-defined on your own hardware. And if your data volume does not justify Storage Scale, Ceph with a conventional RAG pipeline remains a viable and more cost-effective alternative.

As always, the answer depends on volume, data sensitivity and budget. We'll help you evaluate it.

Evaluating on-premises AI?

Tell us your use case. We help you size the right solution.

Fusion HCI, Fusion Software, standalone Storage Scale or Ceph — it depends on what you need. We do not sell a single solution; we help you choose the right one.

View IBM Fusion Talk to the team

IBM Storage — hub Ceph FlashSystem

IBM Fusion & NVIDIA Blackwell: storage for AI on-premises