IBM Fusion & NVIDIA Blackwell: storage for AI on-premises

IBM Storage · NVIDIA · AI

IBM Fusion & NVIDIA Blackwell: storage now processes data for AI.

GTC 2026 brought an IBM-NVIDIA collaboration far deeper than it appears. Fusion is no longer just storage for containers: with Content-Aware Storage and Blackwell GPUs, storage becomes an active AI data preparation engine — the critical layer for enterprise RAG at scale.

8 min readStorage · AI · Infrastructure

On 16 March, IBM took the stage at GTC 2026 in San José with an announcement that passed largely unnoticed outside storage circles: an expanded collaboration with NVIDIA spanning Blackwell Ultra GPUs in IBM Cloud, GPU-native data analytics, intelligent document processing, and on-premises deployments for regulated industries.

Three weeks later, IBM published a technical Redbook detailing how to integrate Storage Scale, Fusion and Content-Aware Storage (CAS) with the NVIDIA AI Data Platform. And recently, IBM, NVIDIA and Samsung demonstrated a CAS system capable of managing 100 billion vectors on a single server — the kind of scale that breaks traditional RAG pipelines.

What does this actually mean in practice? Is it a real architectural shift or keynote marketing? Here's our analysis.

The announcement

GTC 2026: IBM and NVIDIA get serious about enterprise AI

What IBM announced at GTC is not a generic partnership. These are five concrete workstreams that directly affect how enterprises deploy AI on-premises — and all of them connect back to storage for AI:

  • NVIDIA Blackwell Ultra GPUs on IBM Cloud — available from Q2 2026 for large-scale training, high-throughput inference and AI reasoning.
  • Content-Aware Storage (CAS) integrated into the next Fusion release — storage stops being passive and starts processing data for AI.
  • Red Hat AI Factory with NVIDIA — OpenShift + NVIDIA GPUs as the standardised platform for deploying AI in production.
  • IBM Consulting + NVIDIA Blueprints — integration services to move AI from pilot to production.
  • NVIDIA AI Data Platform (AIDP) support — a reference design integrating compute, networking and storage into a unified AI system.
Fuente: IBM Newsroom, 16 marzo 2026

The most impactful data point for on-premises infrastructure: Fusion HCI already includes GPU servers with NVIDIA H200 and RTX Pro 6000 Blackwell Edition. This is not a roadmap — the hardware is available today. Each system supports up to 4 GPU servers with 8 cards each.

To understand how all the pieces fit together, here is the full stack IBM has defined as the AIDP reference architecture on Fusion:

Rob Davis, VP of Storage Networking Technology at NVIDIA, was direct: AI agents need to access, retrieve and process data at scale, and today those steps happen in separate silos. The integration of CAS with NVIDIA orchestrates data and compute across an optimised network fabric to overcome those silos.

The technology

Content-Aware Storage: when storage understands what it holds

This is the most interesting part of the announcement and the least covered. Until now, enterprise storage was a passive repository: it stored files and served them on request. To run RAG (Retrieval-Augmented Generation) or feed AI models with corporate data, you needed a separate pipeline that extracted documents, chunked them, vectorised them and pushed them into a vector database.

CAS eliminates that external pipeline. It operates in two phases — visualised below:

Phase 1: Continuous ingestion and preparation

CAS monitors folders in Storage Scale (or external storage via AFM) and detects changes in real time. When a document is modified or added, CAS processes it automatically: content extraction from text, tables, charts and images using NVIDIA NeMo Retriever, semantic chunking, and conversion into high-dimensional embeddings. Vectors are indexed in a CAS-managed vector database on Storage Scale ECE.

Phase 2: Query and retrieval

When a user or AI agent asks a question, CAS performs semantic search, keyword (BM25) or hybrid retrieval. Results pass through an NVIDIA-optimised reranker for maximum relevance. Critically: vectors inherit the access controls (ACLs) from the original documents. If a user cannot read a file, they cannot see its vectors in RAG results either.

Fuente: IBM Redbook MD248598 — Enabling AI Inference at Scale, abril 2026
Why this matters

Most enterprise RAG deployments fail at two points: data goes stale because nobody updates the vector database, and there is no access control on the vectors. CAS solves both problems at the infrastructure layer, not the application layer. That is a genuine paradigm shift.

IBM + NVIDIA + Samsung demo
100mil millones
vectors on a single server with decoupled compute and storage, GPU-accelerated hierarchical indexing. At that scale, traditional RAG indices become unmanageable.
Fuente: SDxCentral, abril 2026
The hardware

H200, RTX Pro 6000 and Blackwell Ultra: which GPU goes where

There are three NVIDIA GPU lines in the IBM ecosystem that are worth keeping straight. Each has a distinct role — click each tab to see where it deploys and what it's for:

NVIDIA Blackwell Ultra
GTC 2026 · Cloud-first
IBM Cloud
AvailabilityIBM Cloud · Q2 2026
Use caseLarge-scale training, high-throughput inference, AI reasoning
DeploymentCloud only · no on-prem option in Fusion
IntegrationRed Hat AI Factory + VPC servers with compliance controls
If your workload can go to cloud and you have no data residency restrictions, Blackwell Ultra on IBM Cloud is the most powerful option in the catalogue. But if your data cannot leave the perimeter, check the other two tabs.
NVIDIA H200
Hopper · Extended HBM3e memory
Fusion HCI on-prem
AvailabilityFusion HCI · May 2026
Use caseTraining, fine-tuning and heavy LLM inference
Memory141 GB HBM3e · 4.8 TB/s bandwidth
Configuration2 GPUs per server · Up to 4 servers per rack
Maximum total32 GPUs per Fusion system
The H200 is the option for serious on-premises training. If you've read our article on vLLM inference on IBM Power, this is the x86 equivalent for Fusion HCI. Its extended HBM3e memory versus the H100 makes it ideal for large models that previously required aggressive sharding. In Fusion HCI it accesses Storage Scale ECE directly over a 200 GbE fabric.
NVIDIA RTX Pro 6000
Blackwell Edition · Inference + visualisation
Fusion + AIDP
AvailabilityFusion HCI · May 2026
Use caseInference, RAG, CAS vectorisation, professional visualisation
ArchitectureBlackwell Server Edition · 96 GB GDDR7
Configuration2 GPUs per server · Up to 4 servers per rack
AIDP stack+ BlueField-3 DPU · ConnectX-7/8 SuperNICs
The RTX Pro 6000 Blackwell is the GPU in the AIDP reference stack. It accelerates CAS semantic chunking and vectorisation, and combined with the BlueField-3 DPU it offloads network and storage processing from the main CPU. It is the critical piece for production CAS-RAG.
Fuente: IBM Redbook MD248598 — Reference AIDP stack
What is not obvious

BlueField-3 is not just a fast NIC. It is a DPU (Data Processing Unit) that offloads network, storage and security operations from the main CPU. In an AIDP system, the BlueField-3s accelerate communication between Storage Scale and the GPUs, reducing data access latency for real-time inference. It is a critical piece that does not appear in keynotes but makes the difference in real-world performance.

The analysis

What this means for on-premises AI

Putting all the pieces together, the IBM message is clear: Fusion is no longer a container storage product. It is an on-premises AI platform integrating compute (OpenShift), acceleration (NVIDIA GPUs), intelligent storage (Storage Scale + CAS) and optimised networking (Spectrum-X + BlueField-3) in a unified appliance.

For organisations that cannot — or choose not to — send their data to the cloud, this is significant. Especially in three scenarios:

Regulated industries

Banking, healthcare, public sector. Data cannot leave the perimeter. With Fusion HCI + CAS + NVIDIA GPUs you can run corporate RAG on internal documents without anything leaving the rack. And ACLs are enforced at the vector level — compliance built-in, not bolted-on.

AI on proprietary data at scale

IBM estimates 80-90% of enterprise data is unstructured. CAS converts that volume into AI-consumable data continuously and automatically. This is not a one-off ETL project — it is a permanent infrastructure capability.

Alternative to cloud when TCO does not add up

IBM keeps repeating the figure of Databricks-equivalent performance at 60% of the cost. This is an internal benchmark on selected operations, so it deserves some scepticism. But the economic logic of on-premises for predictable, high-volume workloads remains solid. If you know you'll have 30 GPUs running 24/7, on-premises TCO usually wins.

Our take

Real or marketing?

A bit of both, as always. What is unambiguously real:

  • The hardware existe y se puede comprar. Las H200 y RTX Pro 6000 están disponibles como servidores GPU para Fusion HCI. No es un roadmap.
  • CAS works. The 100-billion-vector demo is verifiable. The Redbook details the architecture step by step.
  • NVIDIA AIDP is a real reference design with early adoption in healthcare (UT Southwestern Medical Center) and finance.
  • Red Hat AI Factory standardises OpenShift + GPU deployment as an AI platform — exactly what Fusion HCI delivers as an appliance.

What deserves some nuance:

  • CAS is not yet in Fusion GA. IBM said Q2 2025, then Q2 2026. It's been integrated in Storage Scale since March 2025, but the embedded Fusion version is still landing.
  • The 60% cost vs Databricks figure is an internal benchmark under controlled conditions. In real production, the benefit will depend on your workload.
  • Fusion HCI is not cheap. A rack with H200 GPUs, 16 storage nodes and OpenShift licences is a significant investment. It makes sense for organisations with sensitive data and predictable workloads — not for an AI pilot.
SIXE take

The most significant part of this wave is not the GPUs — everyone has those. It is CAS. Storage that semantically understands what it holds and maintains a real-time vector database with inherited ACLs is a genuine architectural shift. If it works as promised (and the demos suggest it does), it resolves the two main problems with enterprise RAG: data freshness and access security.

That said, not everyone needs Fusion HCI to benefit. CAS lives in Storage Scale, which can also be deployed as software-defined on your own hardware. And if your data volume does not justify Storage Scale, Ceph with a conventional RAG pipeline remains a viable and more cost-effective alternative.

As always, the answer depends on volume, data sensitivity and budget. We'll help you evaluate it.


Evaluating on-premises AI?

Tell us your use case. We help you size the right solution.

Fusion HCI, Fusion Software, standalone Storage Scale or Ceph — it depends on what you need. We do not sell a single solution; we help you choose the right one.

SIXE