VMware in 2026: Four Paths After the Broadcom Model Change

Virtualization · Infrastructure · 2026

VMware in 2026: four paths after the Broadcom model change.

If your vSphere 7 or 8 is humming along, you have zero reason to touch it. Period. What changed is the licensing model, not the platform — and there are four reasonable paths: stay with Broadcom, keep what you have with third-party support, migrate to Proxmox VE, or jump to OpenShift Virtualization. The decision about when — or if — you upgrade is back on your side. We run any of the four.

10 min readGuide

30-second TL;DR: four reasonable paths for your VMware in 2026 — stay with Broadcom (VCF/VVF), keep your vSphere on third-party support, migrate to Proxmox VE, or move to OpenShift Virtualization. All four work; the fit depends on your stack, your timeline, and how much budget you feel like committing to the vendor.

We've been running VMware since before Broadcom existed, and since Broadcom exists. We also do third-party support for enterprise software, migrations to Proxmox and OpenShift, and official training in whatever you need. No favourite horse — just very different customers, and this post is what we tell them when they ask "OK, so what do I do?".

4
Reasonable paths
in 2026
2023
Broadcom completes
VMware acquisition
15+
Years of SIXE
in enterprise virtualization
01 · Context

What changed at VMware after the Broadcom acquisition?

Broadcom closed the VMware acquisition on November 22, 2023. From that point, several things changed on the commercial side — outlined in Broadcom's official portfolio changes article. The technical platform is still the same. What changed is how you buy it, not how it runs.

  • New perpetual licenses are gone. The catalog switched to subscriptions, mainly through two bundles: VMware Cloud Foundation (VCF) and VMware vSphere Foundation (VVF).
  • Catalog reorganized. Products previously sold separately (vSAN, NSX, Aria) are now integrated into the bundles, which changes the per-server licensing math.
  • Partner program reset. Channel agreements were renegotiated and many customers now deal directly with Broadcom or with a tier-one partner.
  • Existing perpetual licenses remain valid: customers who already bought them can keep running them, though without new updates if they don't keep a support contract.
In context

It's a commercial-model change, not a technical fault in VMware. For some organizations the transition is manageable; for others — especially those that don't use all bundle components — the bill multiplies. Do the math before you renew, not after.

02 · Decision

Why are so many companies rethinking VMware in 2026?

We hear it in almost every conversation — three reasons that usually show up together:

  1. The bill goes up. Switching to a bundle with components you don't use (vSAN, NSX, Aria) multiplies the per-core cost.
  2. The clock is ticking. Multi-year contracts expiring, and the classic email of "your renewal is in 90 days, please confirm".
  3. The platform is aging. vSphere 7 reached End of General Support on October 2, 2025; vSphere 8 is scheduled for 2027.

Not everyone experiences this the same way. If you have 3 hosts and a modest cluster, the adjustment is manageable. If you have 300 hosts with stretched vSAN, opening the option set has a clear and quick return. The question is no longer "which version do I buy?" — it's now "what do I do with everything I already have deployed and running?".

03 · The four paths

What are your four real options in 2026?

There are four reasonable paths today. There's no universal answer: the fit depends on how much VMware you have, what runs on top of it, who operates it, and what timeline you're managing.

Option Who it fits What you gain What it costs
1. Stay with Broadcom (VCF / VVF)
You actually use the full bundle (vSAN, NSX, Aria) and have budget for the new subscription model.
Complete platform, official roadmap, direct vendor support, access to new versions.
Annual subscription, bill tied to core count.
2. Keep your VMware with third-party support
You're stable on vSphere 7 or 8 and want to stay that way for several more years — with savings on licensing and autonomy over when (or if) to migrate.
Current version kept secure and operational; contractual SLA; budget freed for investment or headcount; time to evaluate Proxmox or OpenShift without pressure.
No new vendor versions, no direct Broadcom support while the contract lasts.
3. Migrate to Proxmox VE
Teams looking for a general-purpose open source hypervisor equivalent to vSphere for VMs and LXC containers.
Open platform, no per-host licenses, optional enterprise support subscriptions, mature ecosystem.
Migration project (P2V/V2V), team re-skilling, some automation rewrites.
4. Migrate to OpenShift Virtualization
You're already heading toward containers and Kubernetes and want to consolidate VMs and pods on a single platform.
One platform for VMs and containers, native CI/CD and Kubernetes networking integration, Red Hat / IBM enterprise support.
Kubernetes adoption curve, network and storage redesign, wave-based migration plan.

All four work. What we do NOT recommend: deciding against the clock with the renewal looming. In that scenario you almost always end up renewing by default, without having compared anything — and that's the worst option of all, because you didn't even choose it.

Interactive

Which path fits you?

Three quick questions and we'll tell you which of the four paths makes the most sense in your case. No data collected, no email asked — everything runs in your browser.

01When does your current VMware contract expire?
02Where is your platform heading in the next 3 years?
03How "VMware-heavy" is your stack today?
Answer all 3 to see the result
Your initial recommendation

04 · Path 2 in detail

What is third-party support for VMware?

It's not "tech support" in the usual sense. It's what happens when the vendor stops being your maintenance provider and someone else steps in to do that work — us, in this case. We keep your vSphere 7 or 8 stable, patched and under contractual SLA while you decide what to do long-term. It does not replace vendor version upgrades. It does replace the maintenance contract — which is where the money goes.

Customers hire us for four reasons, in varying order:

  • Staying on vSphere 7 or 8 for another five to ten years without anyone pushing you to upgrade. If your platform is stable, you don't need a new version — you need yours to stay secure.
  • Getting the lever back: you set the timeline, not Broadcom's End of General Support date.
  • Pulling money out of software maintenance and putting it where it adds value — new hardware, a project you've been putting off for a year, one more person on the team.
  • A single contract for the entire stack (hypervisor, hardware, guest OS). You stop collecting vertical contracts with whichever vendor.

It fits especially well when your vSphere is solid and you want to stretch it while evaluating Proxmox or OpenShift calmly — or neither, if you change your mind on the way. We cover vSphere 7 and 8 from Spain, in English (and Spanish, and French), with an assigned engineer who doesn't rotate. Scope, SLA and process: third-party VMware support. Same approach for SAP lives at the third-party support hub.

05 · Paths 3 and 4 in detail

And if I want to migrate? Proxmox or OpenShift?

It depends on where your platform is heading over the next five years.

Proxmox VE — the lateral migration

Proxmox VE is the natural answer if what you have is a classic VMware shop — Windows and Linux VMs, shared storage, backups — and you want an open source hypervisor that resembles what you already operate. It supports VM import from VMware, runs on KVM and LXC, and offers enterprise support subscriptions. It's a lateral migration, not a paradigm shift.

OpenShift Virtualization — Kubernetes consolidation

OpenShift Virtualization (based on KubeVirt) is the answer if your organization is already heading toward containers and you want a single platform for VMs and pods. You can run virtual machines as Kubernetes resources alongside your containerized applications, sharing the same network and storage. There's more learning curve, but also more runway if your roadmap is cloud-native. This is also the area where our GSC data shows the most interest from English-speaking visitors — VMware-to-OpenShift migration has become a real conversation in 2026.

A third path: OpenStack

There's a fourth destination worth keeping in mind: OpenStack with Ceph is still a solid option for large environments that want to operate a complete private cloud. Choosing between the three isn't ideological; it's about technical fit and team capability.

06 · Cost and timing

How much does it cost to migrate from VMware? And how long does it take?

There's no catalog price. Anyone giving you a number without looking at your infrastructure is making it up. What it takes and what it costs depends on three things:

  • Estate size: number of hosts, number of VMs, TB size, presence of vSAN/NSX.
  • Network and storage complexity: distributed switches, microsegmentation, replication, DR.
  • Upper-stack dependencies: backup, monitoring, automation, CI/CD.

As planning reference:

  • A migration of dozens of VMs from vSphere to Proxmox or OpenShift typically runs in weeks, with controlled per-service windows.
  • A project on hundreds of VMs runs in months, in waves, using a strangler pattern: new workloads on the destination platform, existing workloads migrated in blocks.
Key insight

What lengthens (and increases the cost of) projects usually isn't the hypervisor — it's the dependencies hanging off it: backup, DR, legacy automation, CMDB integrations. That's why the first deliverable of any serious migration is an inventory and dependency map, not a PoC of the new hypervisor.

07 · Method

What's the recommended order to decide?

This is the procedure we apply at SIXE when a customer asks "what do I do with VMware?". Check the steps off as you complete them — the bar tells you how much you have left and the plan is yours.

SIXE method · 5 steps before renewing or migrating 0 / 5 completed
1
Real estate inventory

Hosts, cores, VMs, vSAN, NSX, Aria, active contracts and expiration dates. Without this, any calculation is fiction.

2
Upper-stack dependency map

Backup, DR, monitoring, automation, CMDB integrations. The real timelines — and the hidden risks — live here.

3
Three comparable numbers

Cost of renewing with Broadcom under the new model · Cost of third-party support for 12-24 months · Estimated migration cost (labour + tooling + training).

4
Informed decision — or combination

The four paths can be combined. Frequent pattern: third-party support for 18 months + progressive migration to Proxmox for standard workloads and to OpenShift for those heading to containers.

5
Wave-based execution plan

Quarterly review. The point is to separate the technical decision from the commercial calendar — the lever is in your hand, not in the renewal date.

08 · Team

What about the team's training?

Any of the four paths requires training, though in different proportions:

  • Stay on VMware under Broadcom: little new training, mostly on the licensing model and bundles.
  • Third-party support: no additional training; the team keeps operating what they already know.
  • Proxmox VE: moderate training; the mental model resembles vSphere but tooling and networking differ.
  • OpenShift Virtualization: significant Kubernetes training; start it before the migration, not during.

We're a training partner for several of these platforms: VMware, Red Hat (including OpenShift), and KVM-based solutions. If you want to keep your team certified on what they already operate, we run official VMware vSphere training. When migration is paired with early training, waves go faster and with fewer incidents.

Four paths for VMware in 2026: continue, third-party support, Proxmox and OpenShift
Four paths for your VMware infrastructure in 2026 — all of them valid; it depends on your stack, your timeline and your budget.
10 · Our position

Is VMware a bad platform after Broadcom?

No. And it's worth saying. Here are things we could say for easy clicks, and won't:

  • That VMware is a bad product. It isn't — we've supported vSphere for more than fifteen years.
  • That Broadcom is the villain. It's a legitimate commercial decision by a vendor. Customers have to decide what to do with it, not insult whoever made it.
  • That Proxmox or OpenShift are "the answer". They're an answer when they fit. In other cases, what fits is staying on VMware — with or without Broadcom.
What we will say

Don't decide in a rush. Do the math with all four. And if you need time to do it right, third-party support exists precisely for that. We've been running VMware since before Broadcom existed. We also run Proxmox, OpenShift and OpenStack. Whatever you decide, we execute — that's the difference.

Summary

The essentials in 5 points

If you skipped here

→ The model changed, not the platform. Subscription (VCF/VVF) instead of new perpetuals. The perpetuals you already own are still yours.

Four reasonable paths: stay with Broadcom, third-party support, migrate to Proxmox or move to OpenShift. All four work.

Third-party support = buying time without compromising security. Your vSphere 7 or 8 stays patched and SLA-covered, but the contract is no longer with Broadcom.

Don't start with the new-hypervisor PoC. Start with the inventory and dependency map. The hypervisor is almost never the hard part.

The four paths combine. Most frequent pattern: 12-18 months of third-party support + progressive migration in parallel.

FAQ

Frequently asked questions

Is third-party VMware support legal?

Yes — and it's not a grey area. It covers operational maintenance of the versions you already have deployed, without redistributing software or modifying licenses. It does not replace vendor version upgrades — it replaces the maintenance contract, which is something else.

Can I keep using VMware if I don't renew with Broadcom?

If you have prior perpetual licenses, yes: you can keep running them. What you lose is access to new updates and direct vendor support. To cover that gap, there's third-party VMware 7 and 8 support with contractual SLA.

When does support for vSphere 7 and vSphere 8 end?

VMware vSphere 7 reached End of General Support on October 2, 2025, per Broadcom's official communication. vSphere 8 is currently scheduled for 2027. Dates are updated periodically in the official lifecycle portal.

Is Proxmox VE a serious enterprise alternative?

Yes, and anyone still claiming otherwise hasn't looked at Proxmox in years. It runs in production in serious organizations, has enterprise support subscriptions and a mature ecosystem of backup, high availability and clustering. The difference with VMware isn't about technical maturity — it's about model (open source vs. proprietary) and tooling.

Does OpenShift Virtualization replace vSphere?

For many workloads, yes. It runs VMs as Kubernetes objects and lets you consolidate VMs and containers on a single platform. If your organization isn't moving toward Kubernetes in the next few years, it's not for you. If you already are, it's one of the strongest cards on the table.

How much will I exactly save by migrating or moving to third-party support?

It depends on your estate and current model. Significant savings are common in large infrastructures with bundles that aren't used at 100% — but you can only state a number after running the math with your numbers. Any percentage without your inventory is marketing.

Same approach for SAP?

Yes. We apply the same "you decide, we execute" logic to SAP infrastructure — IBM Power, AIX, Linux for SAP HANA. It lives at third-party SAP support.

Second opinion, no fluff

Want to see the four paths with your numbers, not ours?

We'll put together a short report with the real cost of each option applied to your actual inventory: stay with Broadcom, third-party support, Proxmox or OpenShift. First conversation is free — if we don't fit, we don't fit. If we do, you'll tell us.

How to Upgrade Debian 12 to Debian 13 Trixie (2026 guide)

Linux · Debian · Systems

How to upgrade Debian 12 to Debian 13 Trixie without breaking production.

Debian 13 "Trixie" is now the stable release. This is the step-by-step method we use at SIXE to upgrade production servers: preparation, repositories, full-upgrade and a tested rollback plan. No surprises.

8 min readTechnical guide

To upgrade from Debian 12 Bookworm to Debian 13 Trixie: back up, fully patch Bookworm, point the repositories from bookworm to trixie, run apt upgrade --without-new-pkgs first and then apt full-upgrade, clean up and reboot.

Simple on a lab box. A different story across a production fleet with databases, critical services and SLAs. At SIXE we have spent more than 15 years keeping Linux infrastructure running in production, and this is the exact methodology we use to run a dist-upgrade without unplanned downtime. Only the 12 → 13 jump is supported: if you are on Debian 11, move to Debian 12 first.

2030
Trixie support
(3 yrs + 2 LTS)
~59,000
Packages in
the repositories
20-60'
Typical upgrade
duration
01 · What's new

What is Debian 13 Trixie and what changed from Bookworm?

Debian 13 "Trixie" has been the stable release of Debian since 9 August 2025. It ships the Linux 6.12 LTS kernel, the completion of the /usr merge (now mandatory), the move to 64-bit time_t (preparing the Year 2038 problem on 32-bit architectures) and APT 3.0, with cleaner output and better dependency resolution.

For a production server it is not a revolution but exactly what you expect from Debian: a cleaner base, a modern kernel and five years of security support ahead. The key operational point is that the support clock resets — and Bookworm's clock is starting to run out.

In context

Trixie does not force you to relearn anything: same APT, same philosophy. The effort is in planning the jump, not adapting to a new system.

02 · Lifecycle

How long is Debian 12 Bookworm supported?

Since Trixie was released in August 2025, Debian 12 became oldstable. It keeps LTS security support until roughly June 2028, but it no longer receives full support from the security team. It is not a critical emergency, but every month you wait adds technical debt and shrinks the window to upgrade calmly.

SUPPORT LIFECYCLE — DEBIAN 12 vs DEBIAN 13 20232025202620282030 TODAY Debian 12 "Bookworm" LTS → Jun 2028 Debian 13 "Trixie" → Jun 2030 Full support LTS
Support windows for Debian 12 and Debian 13 — approximate dates per the Debian lifecycle
Recommendation

Do not wait until the end of Bookworm's cycle. Plan the upgrade in a quiet window, not against the clock when security patches stop arriving.

03 · Preparation

What should you prepare before upgrading?

A dist-upgrade is safe if you prepare it. Most disasters we have seen come not from the upgrade itself but from skipping this phase. Tick every item before you start:

Pre-flight checklist0 / 6 done
Full backup or snapshot. On a VM, take a full snapshot: it is your rollback button.
Read the Debian 13 release notes for your case (known issues).
Enough disk space in / and /var to download the new packages.
Out-of-band access (console / IPMI / KVM) in case SSH drops during the process.
Clean up /boot of old kernels so the 6.12 kernel fits.
Maintenance window agreed and users notified.

All six ticked? Then yes, go ahead with the upgrade.

04 · Step by step

How to upgrade Debian 12 to Debian 13 step by step

The process has six phases: bring Bookworm up to date, point repositories to trixie, disable third-party repos, run the minimal upgrade first and then the full one, clean up and reboot with verification. Here is the flow and the exact commands.

The upgrade in 6 phases
1
Patch BookwormStart from a 100% up-to-date system
2
Repos to Trixiebookworm → trixie (incl. trixie-security)
3
Disable third-party reposRe-enable them one by one afterwards
4
Minimal + full upgrade--without-new-pkgs then full-upgrade
5
Clean upautoremove + autoclean
6
Reboot & verifycat /etc/debian_version → 13.x

1. Bring Debian 12 fully up to date

Any pending patch turns into a conflict during the jump. Start from a pristine Bookworm:

root@debian12 — bash
$ sudo apt update
$ sudo apt upgrade
$ sudo apt full-upgrade
$ sudo apt --purge autoremove

2. Point the repositories to Trixie

Replace bookworm with trixie in your APT sources. This also covers bookworm-securitytrixie-security. Review the result with cat before continuing.

edit APT sources
# Classic format
$ sudo sed -i 's/bookworm/trixie/g' /etc/apt/sources.list

# DEB822 format (recent Debian 12 installs)
$ sudo sed -i 's/bookworm/trixie/g' /etc/apt/sources.list.d/debian.sources

3. Temporarily disable third-party repositories

Any external repo in /etc/apt/sources.list.d/ (Docker, PostgreSQL, etc.) can block the upgrade if it does not support Trixie yet. Disable them now and re-enable them one by one afterwards, checking that each already publishes for Debian 13.

4. Run the upgrade: minimal first, full second

The minimal upgrade reduces the risk of dependency conflicts. Monitor the process: you will answer a few prompts about modified configuration files.

root@debian12 — dist-upgrade
$ sudo apt update
$ sudo apt upgrade --without-new-pkgs   # minimal upgrade
$ sudo apt full-upgrade                 # full upgrade

5. Clean up obsolete packages

clean up
$ sudo apt --purge autoremove -y
$ sudo apt autoclean

6. Reboot and verify

root@debian13 — verification
$ sudo reboot
# after reboot:
$ cat /etc/debian_version   # -> 13.x
$ uname -r                  # -> 6.12.x
$ systemctl --failed        # no failed services
05 · Rollback

How long does it take and can you roll back?

On a typical server the upgrade takes 20 to 60 minutes depending on the number of packages and disk and network speed. Rolling back a dist-upgrade is not trivial once the packages are installed: that is why the prior snapshot is non-negotiable. On VMs, reverting to the snapshot takes minutes; on physical hardware, the rollback plan is restoring from backup.

Golden rule

Never do a major version jump in production without a tested way back. A backup you have never restored is not a backup: it is a hope.

06 · Common errors

What are the most common errors upgrading to Trixie?

What we see most in production, and how to avoid it:

  • Third-party repos without a Trixie release blocking apt: disable them first (step 3).
  • Full /boot preventing the new kernel from installing: clean old kernels before you start.
  • Configuration files overwritten by accepting the package version blindly: when in doubt, keep yours and review afterwards.
  • Forgetting --without-new-pkgs in the minimal phase, which triggers dependency conflicts.
  • Custom services assuming non-merged /usr paths: the /usr merge is now mandatory in Trixie.
07 · In context

Debian or Ubuntu for your server?

The question we get most often. There is no universal answer: it depends on whether you value control and independence (Debian) or integrated commercial support and tools like Ubuntu Pro and Landscape (Ubuntu). Here is the quick comparison:

Debian 13Ubuntu ProRHEL
Governance
Community
Canonical
IBM / Red Hat
Support / release
5 yrs + ELTS
10 yrs
10 yrs
Packages
~59,000
~30,000
~5,000
Licence cost
€0
Per node
Per node
Local-language support
SIXE
SIXE UP
SIXE

Running Ubuntu? As a Canonical partner we have you covered too with SIXE UP. And if your case is moving between distributions, we handle migrations with no downtime.

SIXE engineer upgrading a server from Debian 12 Bookworm to Debian 13 Trixie in a data center
Upgrading Debian 12 Bookworm to Debian 13 Trixie on production infrastructure
08 · Professional support

What if you'd rather not touch production yourself?

Upgrading a lab box is an afternoon. Upgrading a production fleet with SLAs, databases and critical services is another story: you have to inventory dependencies, validate in staging, coordinate windows and have a tested rollback.

That is exactly what we do at SIXE. We offer professional Debian support — planned version upgrades, hardening with Wazuh monitoring, and incident resolution with SLA — and, when it is urgent, 24/7 support. Want your team trained? We run official Linux training.

15+ years keeping Linux in production

Senior engineers who speak your language, no helpdesks, no escalations. Need to upgrade several servers to Debian 13? Tell us about your case and we will propose a plan with a maintenance window and rollback.

Summary

The essentials in 5 points

For the busy reader

Debian 13 Trixie is the stable release since August 2025; Debian 12 is now oldstable (LTS until ~2028).

Back up / snapshot before anything else. It is your only real rollback.

→ Start from a 100% patched Bookworm, switch repos to trixie and run upgrade --without-new-pkgs before the full-upgrade.

Disable third-party repos during the process.

→ Only 12 → 13 is supported: from Debian 11, go through Debian 12 first.

FAQ

Frequently asked questions

Can you upgrade from Debian 11 directly to Debian 13?

No. Debian only supports upgrades between consecutive releases. From Debian 11 "Bullseye" you must first upgrade to Debian 12 "Bookworm" and, once there, to Debian 13 "Trixie".

Do I lose my data and configuration when upgrading?

No, a dist-upgrade preserves data and configuration. Even so, a full backup or snapshot is mandatory: it is your rollback plan if anything fails.

Is it better to upgrade or reinstall from scratch?

For a well-maintained server, the dist-upgrade is safe and much faster. A clean reinstall only pays off if the system carries a lot of technical debt or inconsistent configuration.

Sources

References

Debian. Debian 13 "trixie" Release Information. debian.org/releases/trixie

Debian. Debian 13 Release Notes. debian.org/releases/trixie/releasenotes

Debian. Debian 13 "trixie" released (2025-08-09). debian.org/News/2025

Debian Security Team. debian.org/security

Written by the SIXE systems engineering team. Last updated: .


Professional Debian support

Need to upgrade your servers to Debian 13?

We propose an upgrade plan with a prior audit, staging validation, a maintenance window and a tested rollback. Senior engineering, with SLA.

NIS2 compliance with Wazuh

Cybersecurity · NIS2 · Wazuh

NIS2 compliance with Wazuh.

The NIS2 directive requires monitoring, incident handling and risk management capabilities that most organisations don't yet have in place. Wazuh is a free, open source platform that covers a large part of what the directive demands — but not all of it. This guide explains what it does, what it doesn't, and what you actually need to pass an audit.

12 min readCybersecurity · Compliance

NIS2 (Directive EU 2022/2555) is the EU regulation that raises cybersecurity standards for businesses operating in critical and important sectors. If your organisation provides essential services — or supplies technology to one that does — this applies to you. The fines are real: up to €10 million or 2% of global turnover.

Wazuh is an open source security platform that centralises your logs, detects threats, monitors vulnerabilities and generates the evidence trail an auditor expects. It's free, widely adopted across European public administrations and research institutions — including CERN — and at SIXE we deploy it in production environments configured specifically for regulatory compliance. This article explains NIS2's requirements in plain language, maps them to Wazuh's capabilities, and is honest about the gaps.

€0
Wazuh licence cost
10
NIS2 measures supported
24h
Incident early warning
3K+
Pre-built rules
01 · Scope

Does NIS2 apply to your organisation?

NIS2 broadens the scope of the original 2016 directive significantly. It applies to essential entities (energy, transport, banking, health, water, digital infrastructure, public administration) and important entities (postal services, waste management, food, manufacturing, chemicals, digital providers). If you supply technology or services to any of these sectors, you may also fall within scope.

Tap each card to find out if NIS2 applies to you
"We're a managed service provider hosting infrastructure for a hospital group."
Tap to reveal
Yes, it applies Healthcare is an essential sector under NIS2. As a technology supplier to that sector, you fall within scope through the supply chain provisions (Article 21.2.d).
"We're a mid-size manufacturing company with 300 employees."
Tap to reveal
Yes, it applies Manufacturing is classified as an "important" sector under NIS2. Medium and large enterprises in this sector are in scope. Supervision is reactive (post-incident), but the obligations are real.
"We're a 15-person marketing agency with no public-sector clients."
Tap to reveal
Probably not Small enterprises outside the listed sectors are generally not in scope. But if you handle data for clients in essential sectors, check whether your contracts include NIS2 supply chain requirements.
"We operate a cloud platform used by several EU public administrations."
Tap to reveal
Yes, it applies Digital infrastructure providers (cloud, DNS, data centres) are classified as essential entities under NIS2. Proactive supervision applies, and the obligations are the most stringent.
"We're an energy company distributing electricity across three EU countries."
Tap to reveal
Yes, essential entity Energy is a core essential sector. You face proactive supervision, mandatory incident reporting (24h/72h/1 month), and penalties up to €10M or 2% of global turnover.

Essential vs. Important: two levels of obligation

Essential entities (energy, health, transport, digital infrastructure, banking, public administration) face proactive supervision — authorities can audit you at any time. Important entities (manufacturing, food, chemicals, postal, digital providers) face reactive supervision — authorities investigate after an incident or a complaint. The security obligations under Article 21 are the same for both. The difference is how strictly they're enforced.

02 · What it requires

NIS2 describes SIEM capabilities — without using the word

A SIEM (Security Information and Event Management) is a system that collects log data from all your servers and devices, analyses it automatically to find suspicious patterns, and raises alerts when something looks wrong. Think of it as a security camera for your entire IT infrastructure — one that can actually understand what it sees.

NIS2 doesn't say "install a SIEM". But Article 21 requires a set of measures that in practice demand SIEM-class capabilities:

  • Incident handling (Art. 21.2.b) — detect, respond to and report security incidents. NIS2 requires a three-stage reporting timeline: early warning within 24 hours, notification within 72 hours, and a final report within one month.
  • Risk management (Art. 21.2.a) — continuous risk analysis and policies for information system security.
  • Business continuity (Art. 21.2.c) — backup management, disaster recovery, crisis management.
  • Supply chain security (Art. 21.2.d) — security of relationships with direct suppliers and service providers.
  • Vulnerability handling (Art. 21.2.e) — vulnerability disclosure and management.
  • Monitoring and logging (Art. 21.2.g/h) — policies for assessing the effectiveness of cybersecurity measures, including logging and monitoring of network and information systems.

Meeting the 24-hour early warning requirement without automated detection is extremely difficult. Detecting incidents, preserving evidence, and reporting within the required timelines is exactly what a SIEM does.

Key takeaway

NIS2 doesn't mandate a specific tool. It prescribes outcomes — incident detection, response, logging, risk management. Using Wazuh to deliver those outcomes is a technical and economic decision, not a regulatory requirement.

Where ISO 27001 fits in

ISO 27001:2022 is a voluntary international standard for information security management. Many organisations pursue it alongside NIS2 because the controls overlap significantly. Annex A controls A.8.15 (Logging), A.8.16 (Monitoring activities), A.8.8 (Vulnerability management) and A.8.7 (Malware protection) map directly to Wazuh capabilities. If you're building for NIS2 with Wazuh, you're covering much of ISO 27001's technical layer at the same time.

03 · What Wazuh covers

Which NIS2 and ISO 27001 measures does Wazuh support?

Wazuh is an open source platform (free) that combines several security functions in a single product: it centralises logs from all your devices, watches for changes in critical files, detects known vulnerabilities, checks that your servers follow security best practices, and can automatically block an IP that's trying to brute-force its way in.

It doesn't cover everything NIS2 or ISO 27001 requires — no product does on its own. But it supports the technical measures that require monitoring, detection and traceability:

NIS2 / ISO 27001 requirement Wazuh capability What it does, in plain terms
Incident detection (Art. 21.2.b)
Real-time log analysis
Collects and correlates events across all endpoints to spot threats
Incident response (Art. 21.2.b)
Active Response
Blocks IPs, isolates hosts, kills processes automatically on alert
Vulnerability handling (Art. 21.2.e)
Vulnerability Detection
Scans installed packages against CVE databases to find what needs patching
Logging (A.8.15)
Centralised log management
Collects, normalises and archives logs from every monitored system
Monitoring (A.8.16)
Continuous monitoring
24/7 monitoring of all agents with dashboards and alert rules
Malware protection (A.8.7)
FIM + YARA + VirusTotal
File integrity monitoring, signature scanning, malware detection
Configuration management (A.8.9)
Security Configuration Assessment
Checks your servers against CIS Benchmarks and flags deviations
Access control monitoring
Authentication rules
Detects brute force attempts, failed logins and suspicious access patterns
Intrusion detection (Art. 21.2.a)
Suricata + MITRE ATT&CK
Network IDS with custom rules mapped to known attack techniques
Effectiveness assessment (Art. 21.2.g)
Dashboards + reporting
Real-time compliance and operational metrics for auditors and management
Drag the slider — how does enforcement differ?
Important entity Essential entity
Art. 21All security measures (same for both)
Art. 23Incident reporting (24h / 72h / 1mo)
Art. 32Proactive supervision by authorities
Art. 34Fines up to €10M or 2% of turnover
Art. 32.5Management personal liability
ScopeRandom audits, on-site inspections
Important entity — Same security obligations as essential entities, but reactive supervision (authorities investigate after incidents). Fines up to €7M or 1.4% of turnover.

An important nuance: the table above shows Wazuh's technical capabilities. But a tool doesn't "comply" with NIS2 — your organisation does. Wazuh is the instrument; policies, procedures, governance and an incident response plan are the framework that gives it legal validity.

04 · What it doesn't cover

What Wazuh does NOT do on its own

This is the section most vendors skip. It's also the one that builds the most credibility.

Wazuh does not replace

A risk management framework. NIS2 requires a formal, ongoing risk analysis process. Wazuh detects threats, but it doesn't assess business risks or define risk appetite.

Governance and policies. You need documented security policies approved by management. NIS2 Article 20 makes management directly accountable. No tool writes these for you.

Incident reporting. Wazuh detects incidents and preserves evidence. But filing the 24h early warning, 72h notification and final report with your national authority is an organisational process, not a software feature.

Supply chain security. Article 21.2.d requires you to manage risks in your supply chain. Wazuh monitors your own infrastructure, not your suppliers'.

Business continuity planning. Backup strategy, disaster recovery, crisis management — these are organisational capabilities. Wazuh can monitor backup integrity, but it doesn't design your DR plan.

A team that reviews alerts. A SIEM that nobody looks at is a security camera with the monitor turned off. Wazuh generates alerts; if nobody triages and acts on them, compliance is only on paper.

Understanding these boundaries doesn't weaken the case for Wazuh — it strengthens it. When you know what it does and what it doesn't, you can build a realistic project instead of one that falls apart at the first audit.

05 · No product certification

NIS2 doesn't certify products — it demands outcomes

Unlike some national frameworks that maintain catalogues of "approved" security products, NIS2 doesn't prescribe specific tools. There is no "NIS2-certified SIEM" label. The directive requires organisations to implement appropriate technical and organisational measures proportionate to the risks they face.

This means two things:

  • You can use Wazuh. There's no regulatory barrier to using open source tools for NIS2 compliance. What matters is demonstrating that your measures are effective, documented and proportionate.
  • You need to prove it works. During supervision or audit, you'll need to show evidence that your monitoring, detection and response capabilities actually function — through logs, dashboards, incident records and documented procedures.

This is where Wazuh's audit trail and reporting capabilities become valuable: they generate the evidence an auditor needs to see. But the evidence only has value if it's organised, retained and linked to your documented risk management process.

Key for audit readiness

Any Wazuh deployment for NIS2 needs documented procedures that link the tool's outputs to your risk management framework. Detection without documentation is invisible to an auditor.

06 · Migrating from commercial SIEM

Switching from Splunk or QRadar without losing compliance

If you're running a commercial SIEM with a renewal coming up, Wazuh is a viable alternative. The question isn't whether it works — it's how to migrate without creating a gap in your compliance evidence.

From commercial SIEM to Wazuh — without a compliance gap
1
Audit active rulesWhich rules does someone actually look at? Not the default ones nobody touched in five years.
2
Translate rules to WazuhRewrite custom rules in Wazuh format + build decoders for your internal applications.
3
Export historical dataArchive past events in a neutral format. Without this, auditors see a gap in your records.
4
Run in parallel for 4–6 weeksBoth SIEMs operating simultaneously. Validate that Wazuh captures everything the old one did.
5
Validate with your auditorOptional but recommended. Get sign-off before decommissioning the old system.
6
Decommission + documentDocument the transition in your risk management records. No gap in your compliance trail.

At SIXE we have a strong IBM QRadar practice with official training courses. We know what QRadar does and what Wazuh does, and exactly which pieces need to be rebuilt when you migrate.

07 · The mistake everyone makes

Installing Wazuh out of the box is not NIS2 compliance

Wazuh ships with over 3,000 pre-built detection rules. Most of them don't apply to your environment. If you only run Linux servers but leave Solaris, Windows Server 2012 and AIX rules active, what you get is noise — alerts nobody understands, nobody reads, and after a few weeks, nobody checks.

What the default installation is missing

No NIS2-specific dashboards. The built-in compliance dashboards cover PCI DSS, HIPAA, GDPR and NIST. For NIS2 you need to build them: panels grouped by Article 21 measures with data an auditor can review at a glance.

No decoders for your internal applications. Your internal portals, batch processes and custom software — without decoders tailored to these, their logs arrive as unstructured text and correlation becomes meaningless.

Installation takes 1–2 days. Tuning takes 4–6 weeks. Installation is what looks like the project. Tuning is the project.

If you already have Wazuh deployed but untuned, tell us what you have — the initial assessment has no commitment.

08 · Preparing for audit

What you need ready before supervision or audit

NIS2 supervision can be proactive (essential entities) or reactive (important entities). Either way, when authorities or auditors come knocking, these are the documents and evidence they'll expect:

  • Risk management documentation — a formal risk analysis linked to the Article 21 measures you've implemented, including which tools support each measure and why.
  • Log retention policy — how long you keep logs, where they're stored and how integrity is maintained. NIS2 doesn't specify a minimum period, but industry practice is 12–24 months depending on your risk assessment.
  • Incident response procedures — who receives alerts, how incidents are triaged, contained and reported within the 24h/72h/1-month timeline.
  • Vulnerability management procedures — how often you scan, how fast you patch, what your SLA is based on severity.
  • NIS2-specific dashboards — compliance views grouped by Article 21 measures, separate from generic PCI/HIPAA panels.
  • Archived evidence — periodic exports of critical events, rule configurations and change logs. Ready to hand over without last-minute scrambling.
  • Continuous improvement records — evidence that you review and update your measures regularly, not just when an audit is announced.
09 · Training

Training your team to operate Wazuh for NIS2

A monitoring platform that nobody reviews doesn't detect incidents — it just records them. If your IT team doesn't understand what they're looking at in the Wazuh dashboard, alerts accumulate unmanaged and the tool becomes invisible.

The skills needed to operate Wazuh in a compliance context are specific: reading events and distinguishing real alerts from noise, building custom rules for your applications, mapping evidence to NIS2 and ISO 27001 requirements, and responding to incidents within the reporting timelines.

Wazuh training →

Summary

The essentials, for those short on time

In 6 points

NIS2 describes SIEM capabilities in Article 21 — incident handling, monitoring, logging, vulnerability management. It doesn't name a tool.

Wazuh supports 10 NIS2 and ISO 27001 measures related to monitoring, detection and traceability.

There is no NIS2 product certification. The directive prescribes outcomes, not tools. You demonstrate compliance through documentation and evidence.

Wazuh doesn't replace risk management, governance, incident reporting processes, supply chain security or the team that reviews alerts.

Installing is not complying. The gap between "we have Wazuh" and "we're audit-ready" is 4–6 weeks of serious tuning work.

Wazuh is free. Implementing it properly is not.

FAQ

Frequently asked questions

Does NIS2 require a SIEM?

Not by name. But Article 21 requires incident handling, continuous monitoring and logging capabilities that in practice are only achievable with a SIEM or equivalent platform. Wazuh provides most of these capabilities natively.

Can Wazuh help with NIS2 compliance?

Wazuh provides the technical capabilities to support several Article 21 measures — monitoring, detection, vulnerability management, incident response and audit logging. But compliance is demonstrated by the organisation through policies, procedures and governance, not by any single tool.

Is Wazuh certified for NIS2?

There is no NIS2 product certification scheme. NIS2 prescribes outcomes, not specific tools. Organisations choose their own tooling and demonstrate compliance through audits and supervision by national authorities.

Does Wazuh support ISO 27001?

Yes. Wazuh maps to several Annex A controls in ISO 27001:2022 — particularly A.8.15 (Logging), A.8.16 (Monitoring), A.8.8 (Vulnerability management) and A.8.7 (Malware protection). It provides technical evidence that auditors can review during certification.

What's the difference between NIS2 and ISO 27001?

NIS2 is an EU regulation — mandatory for essential and important entities, with fines up to 2% of global turnover. ISO 27001 is a voluntary international standard. Many organisations pursue both: NIS2 for legal compliance, ISO 27001 for the management framework. The technical controls overlap significantly.

How quickly must incidents be reported under NIS2?

Three stages: early warning within 24 hours, incident notification within 72 hours, final report within one month. Wazuh provides the detection and logging to meet these timelines, but the reporting itself is an organisational process.

Sources

References and regulation cited

Directive (EU) 2022/2555 (NIS2). EUR-Lex — Official Journal

Wazuh — Ensuring NIS2 compliance with Wazuh. wazuh.com

Wazuh — Regulatory compliance use cases. wazuh.com

ISO/IEC 27001:2022 — Information security management systems. iso.org

ENISA — NIS2 Directive guidance. enisa.europa.eu

Wazuh — Official documentation. documentation.wazuh.com

Full training catalogue · SIXE.

Last updated:


Wazuh + NIS2

Let's talk about your project

Tell us which NIS2 category applies to you, what you have in place today and when your next audit is expected. We'll leave with an architecture sketch and next steps.

Chain of Thought: Why Your AI Model Does Not Reason

AI · Reasoning · LLM

Chain of Thought: Why Your AI Model Doesn't Reason.

Chain of Thought is not thinking. It is the statistical shape of thinking. Apple, Arizona State and UC Berkeley prove it with data. Here is what it means for anyone deploying AI in production.

9 min readAI · Production · Infrastructure

Chain of Thought (CoT) is a technique that makes language models generate intermediate steps before answering. While it improves benchmark results, recent research demonstrates it does not constitute genuine reasoning: it is a statistical constraint that mimics the form of human thought.

For organisations deploying AI in production environments, understanding this difference is not a philosophical debate. It is an architectural decision that affects reliability, cost and operational risk. At SIXE we have spent over 15 years designing critical infrastructure where failure tolerance is zero. That experience has taught us a rule that applies equally to an IBM Power cluster and to an AI agent: never trust a single component for anything that cannot go down.

01 · What it is

What is Chain of Thought and why does it look like reasoning?

When you enable "reasoning" or "thinking" mode in models like GPT-5, Claude or DeepSeek, the model generates an intermediate monologue before answering: "okay, first I'll analyse X... now let me consider Y... wait, let me check Z...". In the technical literature this is known as Chain of Thought (CoT).

The problem is that this is not thinking. It is generating text with the statistical shape of human reasoning. The model saw millions of examples of step-by-step reasoning during training and learned to reproduce that pattern. When you ask it to "think", what it actually does is recognise the problem category and fill in the statistical template that best matches.

A concrete example: if you ask a "reasoning" model to size a Ceph storage cluster with 12 OSDs, 3x replication and tolerance for 2 simultaneous node failures, it will return four flawless paragraphs with formulas, failure-domain considerations and a final number. It looks like structured thinking. What it actually did was detect "Ceph sizing problem" and apply the statistical pattern it has seen in hundreds of similar technical documents.

Why does it work? Because most of the time it gets the right answer. The question is what happens when the problem goes off-script.

02 · The evidence

Do LLMs actually reason? What the papers say

CoT works. It measurably improves benchmarks. The relevant question is not whether it works, but why it works. And the answer from the most rigorous studies is uncomfortable.

CoT as a statistical crutch

A team from Arizona State University demonstrated that CoT shines when the problem data falls within the training distribution. As soon as the problem moves outside known territory, performance collapses. It is the difference between a system that has memorised solutions and one that genuinely understands the underlying principles.

CoT as an architectural constraint

CoT is not abstract reasoning: it is a constraint that forces the model to imitate the form of reasoning. Forcing the model to write "first... second... therefore..." makes each generated token influence the next one more coherently. It is an architectural trick that improves the internal coherence of the text, not a cognitive act. The paper Chain-of-Thought Reasoning In The Wild Is Not Always Faithful documents how CoT can give an inaccurate picture of the actual process the model follows to reach its conclusions.

Technical conclusion

CoT is useful for many tasks. But it is not reasoning. It is formal coherence with the appearance of logic.

03 · The decorative

What are "decorative thinking steps" in AI reasoning?

A paper from October 2025 by researchers at UC Berkeley and UC Davis introduced the concept of decorative thinking steps, and their finding is especially relevant for anyone evaluating AI models for production.

The researchers found that many intermediate CoT steps are literally decorative. The model writes things like "wait, let me check... I think I made a mistake... let me recalculate", and then completely ignores that self-correction and delivers the answer it had already decided internally.

The demonstration was elegant: they deliberately perturbed the intermediate steps (changed numbers, altered logic) and checked whether the final answer changed. In many cases, it did not. The conclusion was already decided. The chain of thought was generated afterwards, as post-hoc rationalisation.

Tap each step to discover if it is real or decorative
"Hmm, wait. I think I got the replication factor wrong. Let me recalculate from the beginning..."
Tap to reveal
Decorative The model already had the answer. The "self-correction" did not change the final result. It is narrative theatre.
"Raw capacity = 12 × 8 TB = 96 TB. With 3x replication: 96 / 3 = 32 TB usable."
Tap to reveal
Real This step contains the calculation that determines the final answer. High TTS: the output depends on it.
"Let me verify my answer step by step to make sure I haven't made any calculation errors in the previous estimate..."
Tap to reveal
Decorative Pure rhetorical formula. The model does not re-execute any calculation: it has already emitted the answer tokens. It just adds words that look like rigour.
"Interesting question. Before answering, let me consider multiple angles: the failure domain, OSD balancing and metadata overhead..."
Tap to reveal
Decorative Listing factors without processing them is not analysis. It is the statistical form of what an expert would do. The model has already chosen the answer.
"With 2-node failure tolerance and 4 OSDs per node, the worst case loses 8 OSDs. Minimum guaranteed capacity: (12−8) × 8 / 3 ≈ 10.7 TB."
Tap to reveal
Real Introduces new variables (nodes, OSDs per node) that actually change the result. Without this step, the answer would be different.

A concrete finding: on the AIME dataset, only 2.3% of reasoning steps in the CoT had real causal influence on the model's final prediction. The rest was decoration. (Source: Can Aha Moments Be Fake?, UC Berkeley)

Direct implication

A model explaining well why it reached a conclusion does not mean that conclusion is correct. The explanation is generated alongside (or after) the result, and in many cases it is a justification built on top of a predetermined answer.

04 · Apple

"The Illusion of Thinking": Apple's study that changes everything

If decorative steps demonstrate that CoT does not guarantee reasoning even when it gets the right answer, Apple's study goes one step further: it shows that when the problem gets truly complex, the models give up.

In June 2025, Apple published The Illusion of Thinking, a study that tested state-of-the-art reasoning models on classic computer science puzzles: the Tower of Hanoi, river-crossing problems and other exercises that any first-year student solves with pen and paper.

PERFORMANCE BY COMPLEXITY — APPLE "ILLUSION OF THINKING" DATA (2025) 100% 75% 50% 25% 0% 88% 72% Easy 55% 82% Medium 8% 10% Hard No CoT With CoT
Model performance with and without CoT by complexity — Based on data from Apple ML Research, "The Illusion of Thinking", June 2025
Drag the slider — how does CoT perform as complexity increases?
Easy Medium Hard
No CoT
88%
With CoT
72%
El modelo sin CoT gana. El "pensamiento" extra solo añade coste y latencia.

The most significant finding is the third one. "Reasoning" models do not just fail on complex problems — they reduce computational effort precisely when they should increase it. It is the equivalent of a monitoring system that stops generating alerts when the infrastructure needs it most.

It is worth noting that the paper sparked debate: a team from CSIC in Madrid replicated part of the experiments and noted that some failures were due to output token limits, not pure cognitive limitations. But the core conclusions — that performance collapses with complexity and CoT does not scale predictably — held up.

05 · Cost

Is it worth paying for reasoning models?

It depends. And that is precisely the answer most vendors do not want to give you.

A case that illustrates the risk: a European company built a "reasoning" agent to classify support tickets. The chain of thought the model generated was narratively flawless. The problem was that 30% of the tickets ended up in the wrong queue, and the model explained with impeccable eloquence why that incorrect classification was the right one. Perfect narrative, wrong result.

This happens because we are confusing quality of explanation with quality of decision. They are different things. A model can produce formally flawless reasoning and reach an incorrect conclusion, exactly the same way a presentation with spectacular charts can defend a wrong strategy.

Rule of thumb

Before paying for reasoning models, benchmark with your real-world use case. Vendor marketing decks show results from their best days. Your data, your edge cases and your specific scenarios determine whether the extra cost is justified.

06 · Perspective

Is AI useless then?

No. And it is important to understand this clearly, because the pendulum can swing to the other extreme just as easily.

An LLM not reasoning like a human does not make it useless. It means you need to understand exactly what it does to use it correctly.

An IBM Power10 system running AIX does not "think" about workloads. It has no intuition. What it has is a high-performance RISC architecture, memory bandwidth that an equivalent x86 cannot match, and mainframe-grade reliability (RAS). If you understand what it does, you use it for what it is worth: critical databases, HPC, AI inference at scale. If you do not understand it, you use it as an expensive x86 server and wonder why it underperforms.

The same applies to LLMs. They are extraordinary language processors. They synthesise, translate, draft, classify and extract text patterns at a speed no human team can match. That is real, it has measurable value, and it is transforming operations across every sector.

What they are not is thinking agents with genuine understanding of the world. And selling the latter when what you have is the former is what is creating an expectations bubble that, sooner or later, will correct itself.

07 · In production

How to use AI in production without falling into the trap

In the world of critical infrastructure — IBM Power, AIX, high-availability clusters — there is a principle that never fails: design with redundancy. You never trust a single component for anything that cannot go down.

1. Do not use the explanation as proof the answer is correct

The model's explanation is generated alongside (or after) the result. It is often a post-hoc rationalisation. If the system makes critical decisions, you need independent verification. No matter how well it explains itself.

2. Benchmark with your real use case before choosing a model

For simple tasks, the cheaper model can outperform the expensive one. For medium tasks, CoT pays off. For highly complex tasks, both fail. The only way to know is to test with your real data, not the vendor's.

3. Design architectures with external verification

If your AI architecture is "ask the model and trust what it says", you do not have an architecture. A serious AI deployment includes cross-validation, business rules as a control layer, alerts when model confidence drops, and humans in the loop for critical decisions.

4. Demand evidence, not promises

The AI market is full of extraordinary claims without proportional evidence. A serious vendor shows you benchmarks on your type of data. A less serious vendor shows you a spectacular demo with prepared data.

08 · Our methodology

How do we evaluate AI models for production environments?

At SIXE we apply the same criteria to AI that we have applied to every critical infrastructure component for over 15 years:

  • Testing with real client data, not generic datasets or prepared demos.
  • Performance measurement on edge cases, not just the happy path. Errors do not show up in the median; they show up at the extremes.
  • Redundant architecture always. AI is one more layer in the system, not the entire system. It is complemented by business rules, cross-validation and human oversight where the decision is critical.
  • Model selection by use case, not by marketing. A model with CoT can be perfect for complex text analysis and entirely unnecessary (and more expensive) for simple classification.
  • Infrastructure sized for inference. An AI model is only as good as the infrastructure that supports it. We have verified this first-hand with vLLM on IBM Power and with Ceph as the AI storage backend.
Summary

For executives short on time

The essentials in 6 points

Chain of Thought is not thinking: it is the statistical shape of thinking, a constraint that improves the coherence of generated text.

Apple demonstrated that reasoning models collapse on complex problems and reduce their effort precisely when they should increase it.

Only 2.3% of reasoning steps have causal influence on the model's answer. The rest is decoration.

Do not pay for "reasoning" without measuring it on your specific use case with your real data.

Never use the model's explanation as proof that the answer is correct.

Design with external verification. AI is an extraordinary tool, not an oracle.

Sources

References and cited papers

Apple Machine Learning Research. The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. June 2025. machinelearning.apple.com

Zhao, C. et al. (Arizona State University). Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens. August 2025. arxiv.org/abs/2508.01191

Zhao, J. et al. (UC Berkeley, UC Davis). Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought. October 2025. arxiv.org/abs/2510.24941

Arcuschin, I. et al. Chain-of-Thought Reasoning In The Wild Is Not Always Faithful. March 2025. arxiv.org/abs/2503.08679

Dellibarda Varela, I. et al. (CSIC, Madrid). Rethinking the Illusion of Thinking. July 2025. arxiv.org/abs/2507.01231

Last updated:


AI in production

Need to evaluate how to integrate AI into your infrastructure?

At SIXE we design AI architectures with the same philosophy we apply to any critical system: redundancy, external verification and real benchmarks. Tell us about your case.

Why your RAG pipeline serves last month’s data

RAG · IBM Fusion CAS

Why your RAG pipeline serves last month's data.

Re-vectorizing thousands of documents every time something changes doesn't scale. IBM Fusion CAS integrates vectorization directly into storage: documents change, vectors update themselves.

7 min readRAG · Storage · Unstructured data

IBM Fusion CAS (Content Aware Storage) is a capability built into IBM Fusion that vectorizes, indexes, and keeps documents continuously updated directly in the storage layer — without moving data or rebuilding the vector index.

If you have a RAG pipeline in production, you've probably run into this: documents change, but vectors don't. The contract was amended in March, the chatbot still answers with the December version. It's not a model problem — it's that nobody re-ran the ingestion pipeline. CAS solves exactly that.

80–90%
Of enterprise data is
unstructured
Source: IBM Redbooks
40%
AI prototypes never
reach production
Due to data quality
0
Data copies required
with CAS
Zero-copy ingestion
01 · The problem

Why do vector embeddings in a RAG pipeline go stale?

Between 80% and 90% of enterprise data is unstructured — PDFs, scanned documents, spreadsheets, contracts, support tickets. In a conventional RAG pipeline, the flow to make them accessible to AI is: extract documents → parse → generate embeddings → load into a vector database → search when a query arrives. It works. Until the documents change.

Versioned technical manuals, contracts with addenda, quarterly financial reports, support tickets that get reopened. Every time something changes, you have to re-run the entire pipeline. With thousands of documents, that means hours of GPU time, massive data movement between systems, and a team babysitting the process. According to IBM, 40% of AI prototypes never reach production precisely because of data quality and availability issues.

The usual alternative is to skip re-vectorization. And then your AI answers with two-month-old information.

The security gap nobody sees

In most RAG deployments, vectorization strips out the access controls from original documents. The chatbot has access to the entire vector index, and suddenly a sales rep can extract financial information they shouldn't see because the file's ACLs weren't propagated to the vectors. CAS solves this: vectors inherit the permissions from the source document.

02 · The solution

What is IBM Fusion CAS and what does it do?

CAS (Content Aware Storage) is a capability built into IBM Fusion that operates on top of Storage Scale. It's not a separate product. Storage goes from being a place where bytes are kept to understanding what's inside each file: its structure, its semantics, and how it has changed since it was last processed.

AI-Q Research Assistant architecture with IBM Fusion CAS — ingestion, vectorization, and RAG query flow
AI-Q Research Assistant architecture on IBM Fusion — Source: IBM Community, Sandeep Zende
Capability Traditional RAG pipeline IBM Fusion CAS
Data movement
Copy to external system
Zero-copy in place
Vector updates
Full re-ingestion
Automatic incremental
Change detection
Manual / cron
Real-time
Access control on vectors
Not propagated
ACLs inherited
GPU acceleration
Inference only
From ingestion
Orchestration
Scripts + crons + queues
Built into storage

If you already use Docling (or LibrePower's port for IBM Power) with Milvus and an LLM, you don't need CAS for that to work. A deployment with a few hundred PDFs that rarely change is well served by an orchestrated pipeline and a cron. The tipping point comes when documents number in the tens of thousands, change daily, and access control matters.

03 · How it works

How does CAS process documents without moving them out of storage?

IBM Fusion CAS flow — ingestion and query
📄
Document lands or changes in Storage Scale PDFs, scans, tables, contracts — CAS detects the event automatically
GPU-accelerated extraction and semantic chunking OCR, table recognition, layout analysis — all in storage, no copies
🧬
Embedding generation with NeMo Retriever Vectorization on NVIDIA Blackwell GPUs — RTX PRO 6000, linear scaling
🗄️
Incremental indexing in integrated vector database Only what changed gets updated — with inherited ACLs from source document
🔁
RAG query: retrieve → reason → refine → respond AI-Q Research Assistant: iterative loop with Nemotron + Llama-3, not a single-shot answer
↻ Continuous loop — data is automatically re-processed when it changes

The key difference from a conventional pipeline: there is no manual step between "the document changed" and "the vector index reflects that change." CAS closes that gap automatically, with NVIDIA Blackwell GPUs accelerating every phase — not just final inference. Ingestion and query throughput scales linearly as more NVIDIA RTX PRO 6000 GPUs are added, as documented in the IBM Redbook on NVIDIA AI Data Platform. On BEIR benchmarks (the industry standard for evaluating semantic search), CAS outperforms the most advanced retrieval systems on the market.

04 · Deployment

On-premises because there is no alternative

The entire architecture runs on-premises. This isn't a preference: if your data falls under GDPR, the EU AI Act, EBA banking regulations, or classified information requirements, sending it to a cloud API for vectorization is not a legal option.

It's the same philosophy we described when talking about building an on-premises AI factory with Ceph and Kubernetes, with one difference: CAS integrates data preparation directly into storage. No separate processing cluster to orchestrate, no message queues between NAS and pipeline, no temporary S3 buckets.

Storage Scale vs Ceph: a new argument

If you're evaluating which storage you need for AI workloads — the decision between Storage Scale and Ceph we covered last week — CAS tips the scale. It's something that only exists in the Storage Scale / Fusion ecosystem and has no direct equivalent in Ceph or any other distributed file system today.

05 · Scope

When does CAS make sense over a hand-built RAG pipeline?

CAS requires IBM Fusion on OpenShift. It's not a component you plug into any infrastructure. If your RAG works fine with Docling + Milvus + a cron job, you don't need this.

It makes sense when several of these conditions apply at once:

  • High volume of unstructured documents that change frequently.
  • Granular access control requirements — healthcare, banking, public administration, legal.
  • Existing or planned IBM infrastructure (Fusion, Storage Scale).
  • Need for the vector index to stay current without manual intervention.
  • Data sovereignty and European regulatory compliance.

On-premises RAG architecture

Need to size an AI architecture on Fusion?

At SIXE we work with IBM Fusion, Storage Scale, and RAG pipelines in production. Tell us about your use case and we'll help you design the solution.

Servers & Storage in Stock | Delivery Under 30 Days

Availability · May 2026

Servers and storage in stock. Delivered in under 30 days.

The SSD supply crisis and surging demand for AI infrastructure are pushing industry lead times to 3–6 months. At SIXE we have hardware available, we configure it internally, and we deliver in under 30 days.

8 min readStorage · Servers · Infrastructure

If you're looking to buy a rack server or an enterprise storage array and your usual supplier has quoted weeks or months, it's not an isolated case. It's the new normal.

The good news: we have stock. IBM Power10 and Power11, Dell PowerEdge, Lenovo ThinkSystem servers. IBM FlashSystem storage arrays across all ranges. Configured by our own engineers and ready for production.

<30
Days guaranteed
delivery
3–6
Months average
industry lead time
100%
In-house config
— no middlemen
01 · Context

Why getting hardware is a real problem in 2026

The hardware supply crisis didn't end with the pandemic. It transformed. In 2026, chips are no longer scarce, but global demand for AI compute and storage is consuming SSD, memory and server component production at a rate the supply chain cannot match.

The consequences for any business needing to refresh or expand its on-premise infrastructure are concrete:

  • Migration projects stalled due to unavailable hardware.
  • Expired maintenance contracts with no replacement equipment in sight.
  • Capacity expansions arriving too late for business commitments.
  • Rising prices across the channel due to flash and NVMe SSD shortages.
Actual lead times — industry vs SIXE (May 2026)
All-flash
storage
SIXE
FlashSystem
GPU server
(AI)
SIXE
Servers
x86 rack
server
SIXE
Dell / Lenovo
Industry — high lead time
Industry — variable
SIXE — own stock
Key fact

IBM designs and manufactures its own flash drives (FlashCore Modules). Unlike Dell, HPE or NetApp — who depend on third-party SSD suppliers — IBM controls its own flash storage supply chain. That translates directly into shorter, more predictable lead times for IBM Business Partners like SIXE.

02 · Storage

IBM FlashSystem storage arrays in stock

If there's one product where urgency is highest and SIXE's delivery advantage matters most, it's enterprise storage. The NVMe SSD shortage is stretching lead times for all all-flash arrays. But IBM manufactures its own flash drives — and that lets us hold stock when others can't.

IBM FlashSystem storage arrays — all-flash NVMe enterprise storage with FlashCore Modules
IBM FlashSystem — 2026 generation with FlashCore Module 5 (FCM5)

Entry range: FlashSystem 5015 and 5045

The gateway to enterprise all-flash storage. A 2U dual-controller array with compression, deduplication and AI-powered predictive analytics included. Ideal for SMB storage, backup or mixed workload consolidation. Check availability and conditions.

Mid-range: FlashSystem 5200 and new 5600

For businesses needing real performance. The new FlashSystem 5600 features fifth-generation FlashCore Modules (FCM5) in NVMe EDSFF format with up to 105 TB per module. Hardware-level ransomware detection in under a minute, immutable snapshots (safe-guarded copies) and encryption at rest as standard.

Enterprise: FlashSystem 7200/7600 and 9600

For mission-critical environments: SAP HANA, Oracle databases, AI factories, HPC. The 9600 scales to hundreds of PB with sub-100μs latencies.

Hardware ransomware protection

All FlashSystem arrays include threat detection at the I/O level, beneath the OS and filesystem. Safe-guarded copies create immutable snapshots that cannot be deleted or modified, even with root access.

Beyond the arrays

We also supply IBM Storage Scale (GPFS) for HPC and AI inference, Storage Protect and Storage Defender for enterprise backup, and Storage Fusion for Red Hat OpenShift and Kubernetes architectures.

Competitive terms

As a direct IBM Business Partner we have access to conditions other distributors cannot offer. If you already have a quote from another vendor, talk to us before you decide — we can surprise you.

03 · Servers

Enterprise servers: IBM Power, Dell PowerEdge, Lenovo

IBM Power10
S1012, S1022, S1024, E1050, E1080. AIX, IBM i and Linux. PowerVM. Extreme consolidation.
In stock
IBM Power11
S1122, S1124, E1150, E1180. New generation. Built-in AI and energy efficiency.
New 2026
Dell PowerEdge
T360 (tower server), R660, R760 (rack), XE9680 (8× NVIDIA H100 GPU server for AI).
In stock
Lenovo ThinkSystem
x86 rack servers for virtualisation, databases and enterprise applications.
In stock

IBM Power: runs Linux exactly like an x86 server

This is where many clients pause. "But it's Power…". IBM Power servers run Red Hat, SUSE and Ubuntu natively. You install your Linux distro, spin up containers, deploy OpenShift or Kubernetes. The admin experience is identical.

The difference is underneath. A single Power10 or Power11 consolidates what previously required four or five racks of commodity servers. PowerVM virtualises with a reliability that VMware can no longer promise — or charge you for, since Broadcom changed the licensing model. For anyone looking for a VMware alternative without starting KVM from scratch, Power is the lowest-risk option.

Dell and Lenovo: x86 when x86 is the right answer

Not everything needs a Power. For an SMB tower server running a local ERP, a Dell T360 solves the problem. For the data centre, the R660 and R760 rack servers are the market standard. And if you need a GPU server for on-premise AI inference, the Dell XE9680 with 8 NVIDIA H100 GPUs is the most versatile AI server available today. Check availability and conditions.

04 · Catalogue

What we stock and what it's for

ProductUse caseAvailability
FlashSystem 5015/5045
SMB, backup, consolidation
In stock
FlashSystem 5200/5600
ERP, SAP, virtualisation, AI
In stock
FlashSystem 7600/9600
Mission-critical, HPC, Oracle
In stock
Dell T360
Tower, SMB, ERP, file server
In stock
Dell R660 / R760
Rack, virtualisation, databases
In stock
Dell XE9680
On-premise AI, 8× NVIDIA H100
Enquire
Power10 S1022/S1024
Linux, AIX, IBM i, consolidation
In stock
Power11 S1122/S1124
New generation, built-in AI
In stock
Lenovo ThinkSystem
Standard x86, simple management
In stock

Check availability and conditions →

05 · Service

We don't deliver a box — we deliver a production system

Every hardware project includes solution design, configuration and validation at our facilities, on-site or remote installation, and post-sale support. We don't ship unconfigured hardware.

Our engineers — the same people who deliver official IBM training and provide L2/L3 support — know every product because they work with them daily. If you need to migrate from VMware, from another vendor's array, or from an unsupported server, we handle it.

Optionally: annual maintenance contracts 8×5 or 24×7 with guaranteed SLA.

No middlemen

We're a direct IBM Business Partner. No distributors, resellers or intermediate layers between your project and the manufacturer. That means direct access to IBM stock, partner terms, and the ability to escalate to manufacturer support when needed.


Check availability

Don't wait for the supply chain to get worse.

Tell us what you need — server, storage or both — and we'll confirm availability and delivery timeline within 24 business hours.

Storage Scale vs Ceph for AI Inference: How to Choose

Storage Scale vs Ceph · AI Inference

Storage Scale vs Ceph for AI inference: how to choose.

We deploy both. We've run both in production for years. And no, the answer isn't "use both" — it's understanding what each does well, where each falls short, and which fits your workload.

12 min readStorage · AI · Architecture

The same question keeps showing up in different disguises: "Storage Scale or Ceph for AI inference?"

We're an IBM Business Partner. We sell Storage Scale. We also deliver Ceph training and run Ceph consulting engagements in production. We work with both daily, so what follows comes from building real architectures — not from reading datasheets.

This is what we'd tell a client sitting across the table designing their AI storage architecture.

01 · First things first

Before you choose: what AI inference actually needs from storage

Most people start with the product. We start with the access pattern, because that's what determines whether the choice works or gives you headaches for years.

A real inference environment — not a Llama demo on a laptop — looks like this:

What lives on your inference storage
# The heavy stuff — parallel reads, many nodes at once models/llama-70b/ ← 40-140 GB in safetensors shards models/embedding/ ← small but constantly accessed# RAG — millions of small files, mixed access rag/raw/ ← PDFs, emails, images, audio rag/parsed/ ← Docling/OCR output rag/chunks/ ← fragments, JSONL, parquet rag/embeddings/ ← vectors# Operations — batch, logs, adapters batch-jobs/ ← batch inference input/output checkpoints/ ← LoRA adapters, fine-tuning logs/ ← traceability, evaluation

And all of that gets consumed by a zoo of processes: GPU nodes running vLLM or TGI, CPU nodes, Spark or Ray preprocessing, Docling and OCR pipelines, vector databases, legacy apps coming in over NFS, and someone from compliance who needs to see a PDF from Windows via SMB.

If your setup looks like this — many consumers, many protocols, shared data — that pattern should drive the decision. Not the price per TB or the vendor logo.

02 · Storage Scale

Where Storage Scale wins for AI inference

Native POSIX: your frameworks expect a directory, not a bucket

This seems obvious until you have to set it up. Look at how the tools you'll actually use load models:

How real frameworks load models
# HuggingFace Transformers model = AutoModel.from_pretrained("/models/mistral-7b")# vLLM vllm serve /models/Meta-Llama-3-8B-Instruct# Triton Inference Server model_repository: /models/triton-repo/

They want a path. A directory. Not an S3 endpoint. Storage Scale (formerly GPFS) is a native parallel POSIX filesystem. That /models is a shared directory that all nodes read concurrently, with real concurrent access, no intermediate copies. Nothing to invent.

IBM Storage Scale — official docs

Zero copies between layers: the argument we find most convincing

With Ceph S3, the typical pattern for serving a model goes: download from bucket → write to local disk or PVC → start the inference engine → serve. That's three steps before the first query lands. And if you have 16 nodes, all 16 download their own copy.

With Storage Scale, the inference engine points to /gpfs/models/llama-70b/ and starts. Done. No download, no cache, no "does this node have the latest version?". When you update the model, you update it once and every node sees it.

This matters most when you're iterating — swapping models, testing LoRA adapters, rotating configurations. With local cache you end up maintaining sync scripts, invalidation logic and disk cleanup. With a parallel filesystem there's nothing to maintain.

Multi-protocol on the same file

This is what solves the enterprise headache. A single file in Storage Scale can be consumed via POSIX (GPU node), S3 (modern app), NFS (data team), SMB (someone on Windows) and CSI (Kubernetes pod). The same file. Not a copy per protocol. Not a different namespace per interface.

IBM implements this through Cluster Export Services (CES), which exposes S3, NFS and SMB access over the same data in the parallel filesystem.

In an environment where modern containers coexist with legacy applications nobody dares touch, this is what lets you build an AI factory without breaking what already works.

Metadata: when you have millions of small files

Enterprise RAG isn't "three PDFs in a bucket". It's millions of documents, millions of chunks, millions of embeddings, config files, auxiliary indices, shards, logs. Heavy operations on large directories with many small files. Storage Scale has been solving this in HPC environments for decades — Summit, Sierra and other supercomputers ran on GPFS. CephFS can handle this, but in our experience it takes significantly more design effort to keep it from struggling.

03 · Ceph

Where Ceph wins for AI inference storage

Massive object storage: real S3, not S3 as an afterthought

Ceph was built as distributed storage for objects, blocks and files. Its RGW (RADOS Gateway) provides a full S3 API with lifecycle policies, versioning, multi-tenancy, IAM — everything you need to run a proper object store. It's not a bolt-on. It's the core.

If your inference pipeline is S3-native — models downloaded from a bucket, datasets read via API, results written as objects — Ceph handles it well. A well-designed Ceph cluster scales horizontally to hundreds of PB by adding commodity nodes.

Cost per TB: Ceph wins this one outright

Let's be direct: Storage Scale costs more. It needs IBM licences, hardware with specific requirements, and people who know how to run it (there aren't many). Ceph runs on commodity hardware, has no software licence cost, and a team with solid Linux experience can operate it.

For a client with petabytes of data where most of it is cold — training datasets, historical archives, model backups — there's no reason to pay Storage Scale prices for TBs that get read once a month. Ceph with well-configured erasure coding is the right answer there.

Kubernetes: Rook makes everything trivial

For teams that live in Kubernetes or OpenShift, Ceph with Rook is hard to beat. A single operator that gives you RBD (ReadWriteOnce), CephFS (ReadWriteMany) and RGW (S3) from one cluster. OpenShift Data Foundation (ODF) is literally Ceph packaged by Red Hat — we cover this in detail in our Ceph vs MinIO 2026 guide.

Storage Scale has CSI too, but Rook/Ceph has been in the Kubernetes ecosystem longer and the community is much larger. If your team thinks in operators, Helm charts and GitOps, Ceph speaks their language.

Block storage for VMs and databases

If you also run OpenStack, virtualisation, or need block volumes for databases alongside inference, Ceph's RBD is best-in-class. Storage Scale doesn't compete here — it's not its territory.

Scale

CERN runs over 60 PB on Ceph in production, underpinning its OpenStack infrastructure. They've gone from a few PB to exabyte scale in a decade, adding nodes without architectural disruption. We cover this in more depth in our article on open source storage for AI and HPC.

04 · The downsides

What each gets wrong — and nobody likes talking about

This is where most articles get vague. We won't. We deploy both, and both have things we don't like.

What we don't like about Storage Scale

  • It's expensive. IBM licences, specific hardware requirements, and a rack with Storage Scale ECE plus GPUs isn't a small investment. For an AI pilot or a startup, it doesn't make sense.
  • Running it requires HPC expertise. It's not that it's difficult — it's a different world from cloud-native. If your team lives in Kubernetes and has never touched a parallel filesystem, the learning curve is real.
  • S3 isn't its strong suit. Storage Scale has S3 access via CES, and it works. But if you compare it with Ceph's RGW on pure S3 features — lifecycle, multi-tenancy, advanced versioning — Ceph has the edge.
  • Block storage: essentially absent. If you need RBD or block volumes for VMs, Storage Scale is not your tool.

What we don't like about Ceph

  • CephFS is not GPFS. CephFS works, but for many concurrent clients doing parallel I/O across millions of files (the classic AI/HPC pattern), Storage Scale has considerably more mileage. We explained this in our 2023 comparison.
  • Local cache adds complexity. If your models live in S3, every inference node downloads its copy. With 4 nodes that's trivial. With 32, you're maintaining sync scripts, tracking cache versions, and hoping local disks don't fill up.
  • Multi-protocol isn't clean. Ceph speaks RGW (S3), RBD (block), CephFS (file) and NFS (via Ganesha). But each protocol operates on its own pool or namespace. You can't transparently read the same file via S3 and NFS the way Storage Scale lets you.
  • Metadata under pressure. Intensive operations on directories with millions of small files (the RAG use case) can bottleneck in CephFS if the design isn't right. Ceph doesn't forgive improvisation.
Common trap

Mounting s3fs or goofys to give POSIX semantics to Ceph S3 so you can use from_pretrained() directly. Technically works. In production, the POSIX semantics are partial, performance is unpredictable, and the errors get creative. We don't recommend it as a permanent solution.

05 · The comparison

Storage Scale vs Ceph for AI storage: summary

Criteria Storage Scale Ceph
POSIX for AI
Native
CephFS
Object store S3
Via CES
Native
Block storage
No
RBD
Shared model loading
Direct
Via cache
RAG / many files
Strong
If S3
Multi-protocol / same file
Yes
Not clean
Kubernetes
CSI
Rook
Cost per TB
High
Low
Operations
HPC
SRE
AI/HPC heritage
Decades
Growing

The short version: Storage Scale wins when data is "alive" — many processes reading shared models, POSIX pipelines, mixed environments where Kubernetes and legacy apps coexist. Ceph wins when data is objects — S3 as the primary interface, models cached locally, cloud-native teams, tight budget.

Using Storage Scale as a cheap object store is a waste of money. Trying to make CephFS behave like an HPC parallel filesystem is asking for trouble. Each is very good at what it does.

06 · Our take

What we'd tell a client asking today

If pushed: for AI inference in enterprise environments with shared data and RAG, Storage Scale causes fewer headaches. Models are alive, shared, accessible via whatever protocol each consumer needs. No sync scripts, no cache prayers.

But if your pattern is genuinely cloud-native — stateless pods, S3 as source of truth, an SRE team that knows how to run Ceph — then Ceph is the right call and it'll cost you considerably less. We're not saying that to be polite: we've seen it work this way in production many times.

And if your environment runs sensitive data on IBM Power, integrates with DB2 or Oracle, and inference will coexist with HPC or analytics workloads — Storage Scale has no real competitor there. It's its natural territory. Storage Scale combined with Content-Aware Storage in IBM Fusion is starting to turn storage into an active data preparation engine for RAG.


Storage Scale or Ceph?

It depends. Tell us your use case and we'll tell you which.

We run both in production. Tell us about your workload and we'll point you in the right direction.

IBM Fusion & NVIDIA Blackwell: storage for AI on-premises

IBM Storage · NVIDIA · AI

IBM Fusion & NVIDIA Blackwell: storage now processes data for AI.

GTC 2026 brought an IBM-NVIDIA collaboration far deeper than it appears. Fusion is no longer just storage for containers: with Content-Aware Storage and Blackwell GPUs, storage becomes an active AI data preparation engine — the critical layer for enterprise RAG at scale.

8 min readStorage · AI · Infrastructure

On 16 March, IBM took the stage at GTC 2026 in San José with an announcement that passed largely unnoticed outside storage circles: an expanded collaboration with NVIDIA spanning Blackwell Ultra GPUs in IBM Cloud, GPU-native data analytics, intelligent document processing, and on-premises deployments for regulated industries.

Three weeks later, IBM published a technical Redbook detailing how to integrate Storage Scale, Fusion and Content-Aware Storage (CAS) with the NVIDIA AI Data Platform. And recently, IBM, NVIDIA and Samsung demonstrated a CAS system capable of managing 100 billion vectors on a single server — the kind of scale that breaks traditional RAG pipelines.

What does this actually mean in practice? Is it a real architectural shift or keynote marketing? Here's our analysis.

The announcement

GTC 2026: IBM and NVIDIA get serious about enterprise AI

What IBM announced at GTC is not a generic partnership. These are five concrete workstreams that directly affect how enterprises deploy AI on-premises — and all of them connect back to storage for AI:

  • NVIDIA Blackwell Ultra GPUs on IBM Cloud — available from Q2 2026 for large-scale training, high-throughput inference and AI reasoning.
  • Content-Aware Storage (CAS) integrated into the next Fusion release — storage stops being passive and starts processing data for AI.
  • Red Hat AI Factory with NVIDIA — OpenShift + NVIDIA GPUs as the standardised platform for deploying AI in production.
  • IBM Consulting + NVIDIA Blueprints — integration services to move AI from pilot to production.
  • NVIDIA AI Data Platform (AIDP) support — a reference design integrating compute, networking and storage into a unified AI system.
Fuente: IBM Newsroom, 16 marzo 2026

The most impactful data point for on-premises infrastructure: Fusion HCI already includes GPU servers with NVIDIA H200 and RTX Pro 6000 Blackwell Edition. This is not a roadmap — the hardware is available today. Each system supports up to 4 GPU servers with 8 cards each.

To understand how all the pieces fit together, here is the full stack IBM has defined as the AIDP reference architecture on Fusion:

Rob Davis, VP of Storage Networking Technology at NVIDIA, was direct: AI agents need to access, retrieve and process data at scale, and today those steps happen in separate silos. The integration of CAS with NVIDIA orchestrates data and compute across an optimised network fabric to overcome those silos.

The technology

Content-Aware Storage: when storage understands what it holds

This is the most interesting part of the announcement and the least covered. Until now, enterprise storage was a passive repository: it stored files and served them on request. To run RAG (Retrieval-Augmented Generation) or feed AI models with corporate data, you needed a separate pipeline that extracted documents, chunked them, vectorised them and pushed them into a vector database.

CAS eliminates that external pipeline. It operates in two phases — visualised below:

Phase 1: Continuous ingestion and preparation

CAS monitors folders in Storage Scale (or external storage via AFM) and detects changes in real time. When a document is modified or added, CAS processes it automatically: content extraction from text, tables, charts and images using NVIDIA NeMo Retriever, semantic chunking, and conversion into high-dimensional embeddings. Vectors are indexed in a CAS-managed vector database on Storage Scale ECE.

Phase 2: Query and retrieval

When a user or AI agent asks a question, CAS performs semantic search, keyword (BM25) or hybrid retrieval. Results pass through an NVIDIA-optimised reranker for maximum relevance. Critically: vectors inherit the access controls (ACLs) from the original documents. If a user cannot read a file, they cannot see its vectors in RAG results either.

Fuente: IBM Redbook MD248598 — Enabling AI Inference at Scale, abril 2026
Why this matters

Most enterprise RAG deployments fail at two points: data goes stale because nobody updates the vector database, and there is no access control on the vectors. CAS solves both problems at the infrastructure layer, not the application layer. That is a genuine paradigm shift.

IBM + NVIDIA + Samsung demo
100mil millones
vectors on a single server with decoupled compute and storage, GPU-accelerated hierarchical indexing. At that scale, traditional RAG indices become unmanageable.
Fuente: SDxCentral, abril 2026
The hardware

H200, RTX Pro 6000 and Blackwell Ultra: which GPU goes where

There are three NVIDIA GPU lines in the IBM ecosystem that are worth keeping straight. Each has a distinct role — click each tab to see where it deploys and what it's for:

NVIDIA Blackwell Ultra
GTC 2026 · Cloud-first
IBM Cloud
AvailabilityIBM Cloud · Q2 2026
Use caseLarge-scale training, high-throughput inference, AI reasoning
DeploymentCloud only · no on-prem option in Fusion
IntegrationRed Hat AI Factory + VPC servers with compliance controls
If your workload can go to cloud and you have no data residency restrictions, Blackwell Ultra on IBM Cloud is the most powerful option in the catalogue. But if your data cannot leave the perimeter, check the other two tabs.
NVIDIA H200
Hopper · Extended HBM3e memory
Fusion HCI on-prem
AvailabilityFusion HCI · May 2026
Use caseTraining, fine-tuning and heavy LLM inference
Memory141 GB HBM3e · 4.8 TB/s bandwidth
Configuration2 GPUs per server · Up to 4 servers per rack
Maximum total32 GPUs per Fusion system
The H200 is the option for serious on-premises training. If you've read our article on vLLM inference on IBM Power, this is the x86 equivalent for Fusion HCI. Its extended HBM3e memory versus the H100 makes it ideal for large models that previously required aggressive sharding. In Fusion HCI it accesses Storage Scale ECE directly over a 200 GbE fabric.
NVIDIA RTX Pro 6000
Blackwell Edition · Inference + visualisation
Fusion + AIDP
AvailabilityFusion HCI · May 2026
Use caseInference, RAG, CAS vectorisation, professional visualisation
ArchitectureBlackwell Server Edition · 96 GB GDDR7
Configuration2 GPUs per server · Up to 4 servers per rack
AIDP stack+ BlueField-3 DPU · ConnectX-7/8 SuperNICs
The RTX Pro 6000 Blackwell is the GPU in the AIDP reference stack. It accelerates CAS semantic chunking and vectorisation, and combined with the BlueField-3 DPU it offloads network and storage processing from the main CPU. It is the critical piece for production CAS-RAG.
Fuente: IBM Redbook MD248598 — Reference AIDP stack
What is not obvious

BlueField-3 is not just a fast NIC. It is a DPU (Data Processing Unit) that offloads network, storage and security operations from the main CPU. In an AIDP system, the BlueField-3s accelerate communication between Storage Scale and the GPUs, reducing data access latency for real-time inference. It is a critical piece that does not appear in keynotes but makes the difference in real-world performance.

The analysis

What this means for on-premises AI

Putting all the pieces together, the IBM message is clear: Fusion is no longer a container storage product. It is an on-premises AI platform integrating compute (OpenShift), acceleration (NVIDIA GPUs), intelligent storage (Storage Scale + CAS) and optimised networking (Spectrum-X + BlueField-3) in a unified appliance.

For organisations that cannot — or choose not to — send their data to the cloud, this is significant. Especially in three scenarios:

Regulated industries

Banking, healthcare, public sector. Data cannot leave the perimeter. With Fusion HCI + CAS + NVIDIA GPUs you can run corporate RAG on internal documents without anything leaving the rack. And ACLs are enforced at the vector level — compliance built-in, not bolted-on.

AI on proprietary data at scale

IBM estimates 80-90% of enterprise data is unstructured. CAS converts that volume into AI-consumable data continuously and automatically. This is not a one-off ETL project — it is a permanent infrastructure capability.

Alternative to cloud when TCO does not add up

IBM keeps repeating the figure of Databricks-equivalent performance at 60% of the cost. This is an internal benchmark on selected operations, so it deserves some scepticism. But the economic logic of on-premises for predictable, high-volume workloads remains solid. If you know you'll have 30 GPUs running 24/7, on-premises TCO usually wins.

Our take

Real or marketing?

A bit of both, as always. What is unambiguously real:

  • The hardware existe y se puede comprar. Las H200 y RTX Pro 6000 están disponibles como servidores GPU para Fusion HCI. No es un roadmap.
  • CAS works. The 100-billion-vector demo is verifiable. The Redbook details the architecture step by step.
  • NVIDIA AIDP is a real reference design with early adoption in healthcare (UT Southwestern Medical Center) and finance.
  • Red Hat AI Factory standardises OpenShift + GPU deployment as an AI platform — exactly what Fusion HCI delivers as an appliance.

What deserves some nuance:

  • CAS is not yet in Fusion GA. IBM said Q2 2025, then Q2 2026. It's been integrated in Storage Scale since March 2025, but the embedded Fusion version is still landing.
  • The 60% cost vs Databricks figure is an internal benchmark under controlled conditions. In real production, the benefit will depend on your workload.
  • Fusion HCI is not cheap. A rack with H200 GPUs, 16 storage nodes and OpenShift licences is a significant investment. It makes sense for organisations with sensitive data and predictable workloads — not for an AI pilot.
SIXE take

The most significant part of this wave is not the GPUs — everyone has those. It is CAS. Storage that semantically understands what it holds and maintains a real-time vector database with inherited ACLs is a genuine architectural shift. If it works as promised (and the demos suggest it does), it resolves the two main problems with enterprise RAG: data freshness and access security.

That said, not everyone needs Fusion HCI to benefit. CAS lives in Storage Scale, which can also be deployed as software-defined on your own hardware. And if your data volume does not justify Storage Scale, Ceph with a conventional RAG pipeline remains a viable and more cost-effective alternative.

As always, the answer depends on volume, data sensitivity and budget. We'll help you evaluate it.


Evaluating on-premises AI?

Tell us your use case. We help you size the right solution.

Fusion HCI, Fusion Software, standalone Storage Scale or Ceph — it depends on what you need. We do not sell a single solution; we help you choose the right one.

Still on Informix 12.10? You need to read this

IBM Informix · Migration

Still on Informix 12.10? You need to read this.

12.10 support is over. But while you weren't looking, Informix changed quite a bit: standalone containers for Kubernetes, a rebuilt engine under the hood, native S3 backup, and a certification that's switching versions.

6 min readDatabases · Cloud-native

In March, Anup Nair — Principal Technical Product Manager for Informix — published the Informix Standalone Container announcement on IBM TechXchange. Enterprise-grade images for Kubernetes and OpenShift, with full support, no Cloud Pak for Data required. The post has 13 views and zero comments. Almost nobody noticed.

And yet, it's probably the most significant change in how Informix is deployed since IBM acquired it. Combined with the end of 12.10 support, the new capabilities in Informix 15 and the v15 certification transition, the landscape has shifted. Here's the summary.

Informix updates — standalone containers, 12.10 migration and official training
The news

Standalone containers: what's changed

Until recently, running Informix in containers with enterprise support in production required Cloud Pak for Data (CP4D). The Docker Hub images — which have since moved to the IBM Container Registry (ICR) — were Developer Edition only: fine for dev and test, no IBM support for production.

The Informix Standalone Container removes that dependency:

  • Production-grade images for Informix v14 and v15 on IBM Container Registry.
  • Direct deployment on Kubernetes and OpenShift without CP4D.
  • Full enterprise support under existing entitlement — not a separate product.
  • Developer to Enterprise Edition upgrade by swapping the image and applying the installer.
  • CASE bundles on GitHub: ibm-informix-standalone.
Source: Anup Nair, IBM TechXchange Community, March 2026

In practice, this means integrating Informix into CI/CD pipelines, spinning up instances for automated tests, deploying with Helm Charts across hybrid-cloud environments, and having dev environments identical to production. Daniel Weber, Informix Container Dev Lead, demonstrated the process at the April IIUG Tech Talk.

It's the kind of announcement that doesn't make headlines but changes how you operate a product day to day.

The cut-off

Informix 12.10 has lost support

On 30 April, IBM completed the transition of Informix 12.10 to Extended Support. Three direct consequences:

  • No more security patches. Any vulnerability discovered from now on stays unresolved.
  • No more bug fixes. The current fixpack is the last one.
  • Technical support only at extra cost under Extended Support, until 2030.
Source: IBM Product Lifecycle — Informix 12.10

The CSDK 4.10 (Client SDK) also transitions to Extended Support on the same date, which affects legacy ESQL/C, ODBC, and JDBC drivers.

12.10 was released in 2013. Thirteen years is a reasonable lifecycle. But many production environments are still running it because "it works" and there was never enough pressure to migrate. Now there is — and it coincides with a genuine modernisation path: containers, native encryption, S3 backup.

The decision

14.10 or 15: which one for migration

Informix 15 (November 2024) is the first major release in over five years, with deep internal engine changes: drastically expanded limits (a single partition can now hold 140 trillion pages), native dbspace encryption, improved Wire Listener (MongoDB 3.2–4.2), native S3/Azure/IBM COS backup via ON-Bar, official Helm Charts, and mandatory Java 11.

Informix 14.10xC13 (January 2026) is the latest release on the 14.10 branch: archecker improvements for SmartLOBs, Direct I/O for 2K/4K blocks, and accumulated fixes. Mature, battle-tested, conservative.

Stable · Proven
Informix 14.10xC13
January 2026 · Mature release
RiskLow
UpgradeIn-place from 12.10
DriversHigh compatibility
Dbspace encryptionNo
S3 backupNot native
ContainersStandalone ✓

Our recommendation: if the environment is stable and you don't need the features in 15, the 12.10 → 14.10 path is the safest. It gives Informix 15 time to accumulate fixpacks. If it's a new project or you need native encryption, cloud backup, or Kubernetes-native deployment, go straight to 15.

What we don't recommend

Staying on 12.10 under Extended Support "until there's no other option". Extended Support has an end date and the longer you wait, the less flexibility you'll have. Rushing a migration is expensive. Planning one is a manageable project.

The team

Why training your team matters as much as migrating

We see it project after project. A team upgrades Informix and keeps operating with the same procedures from eight years ago. The server is on the new version; the knowledge isn't.

14.10 introduced AUTO_TUNE, which makes many manual adjustments unnecessary. 15 changes the approach to encryption, backup, and deployment. And with standalone containers there's a brand new operational model that requires Kubernetes skills a traditional DBA may not have. The official documentation is in the Informix 15 containerised deployment guide, but documentation doesn't replace hands-on training.

Our Informix courses

At SIXE, we've been delivering official IBM training for over 15 years. We've updated the entire Informix curriculum to v15 with new labs and custom documentation.

Informix 15 & 14.10 System Administration (SIFMX819G) — 3 days / 24 hours. From server architecture to production troubleshooting, with a 12.10→14.10/15 migration module and container deployment. Delivered by practising DBAs on a live Informix 15 instance.

The full Informix catalogue includes system administration, database administration, and custom migration courses. In English, Spanishand French — online, on-site, open or in-company.

View all Informix courses →


Migration, containers, or training?

Tell us what Informix you're running and we'll tell you where to start.

Current version, whether you're considering containers, whether your team needs training before moving production. No sales pitch.

Ceph Object Storage vs IBM COS: Migration Guide (2026)

Object Storage · April 2026

Ceph object storage vs IBM COS: when to migrate, and which way.

Three realistic paths for enterprise object storage at petabyte scale — and how we reach the right recommendation in each case. Fifteen years of production deployments and three live client cases on the table.

April 202611 min readInfrastructure · Open Source

In 2026, if you're running a multi-petabyte object storage deployment and thinking about the next five years, you have three realistic options: IBM Cloud Object Storage (the Cleversafe successor), upstream Ceph backed by a support partner, or commercially packaged Ceph — typically IBM Storage Ceph.

We prefer open source and say so upfront. But we've also recommended IBM COS to specific clients knowing it was the right call — and talked clients out of migrations that would have padded our invoice but complicated their operations without real gain. This article explains when and why, with real cases.

Comparativa IBM COS vs IBM Storage Ceph vs Ceph upstream — criterios de elección para migración de object storage
The landscape

The 2026 landscape, plainly

The on-premise object storage market has been reshuffling for three years. IBM has repositioned COS multiple times since acquiring Cleversafe in 2015: first as a standalone product, then pushed toward IBM Storage Ready Nodes, then folded into the "cyber vault" narrative inside the Storage Defender portfolio. Legacy Cleversafe customers — many running decade-old deployments on Cisco UCS hardware now at end of life — are asking what the next five years look like before IBM changes the message again.

Ceph, meanwhile, has done the opposite. It has consolidated. The current release, Tentacle (20.2.1, April 2026), closes a maturity cycle that started with Reef and Squid. Active contributors include CERN, DigitalOcean, Bloomberg, OVH, Clyso, Red Hat/IBM, and SUSE. It is hard to find an infrastructure open source project with more sustained momentum.

Between them sits IBM Storage Ceph: upstream Ceph packaged and commercially supported by IBM, the direct successor to Red Hat Ceph Storage. Technically the same Ceph. Commercially, a per-capacity subscription with a vendor tier-1 SLA. It exists because some clients' procurement policies mandate a named enterprise vendor, and bare upstream Ceph doesn't pass that filter — even if it is technically identical.

Three products, three business models, three distinct client profiles.

The options

The three options at a glance

IBM COS
Patented IDA (SecureSlice), closed three-tier architecture, certified hardware list. Strongest in advanced regulatory compliance environments.
Proprietary
IBM Cloud Object Storage
Cleversafe successor · ClevOS
LicenseIBM proprietary
HardwareClosed certified list
ProtocolsObject S3 / Swift
5-yr costHigh
Lock-inHigh
Ops complexityLow
IBM Storage Ceph
Upstream Ceph with IBM subscription. Same codebase, tier-1 contractual SLA. For clients who need a named vendor in the contract.
Ceph + IBM
IBM Storage Ceph
Red Hat Ceph successor · ppc64le
LicenseIBM subscription
HardwareAny x86 / ARM
ProtocolsS3 · RBD · CephFS · NVMe-oF
5-yr costMedium-high
Lock-inMedium
Ops complexityMedium

Hover each card for detail · The right option depends entirely on each client's operational reality

All three work. The differences that matter are not about what they do, but how they are operated and what they cost over five years. IBM Storage Ceph and IBM COS do not compete — they serve fundamentally different client profiles. For a deeper comparison of Ceph against Storage Scale, GPFS, or NFS, see our dedicated article: IBM Storage Ceph vs Storage Scale, GPFS, GFS2, NFS and SMB.

Our position

Why we prefer open source

It's not ideology. It's the result of seeing, project after project, that a client with a competent in-house team or a capable partner gets the same operational stability on upstream Ceph as on any commercial alternative — with significantly more freedom and at lower cost.

Proprietary lock-in is not just about hardware — it's about roadmap. If IBM repositions COS again — and it has happened multiple times since 2015 — the client watches the change from the sidelines. With Ceph, if your commercial distributor changes strategy or raises prices, you move to upstream or another distributor without migrating data. The portability is real, not marketing.

Community continuity is a guarantee no single vendor can match. A proprietary product depends ultimately on a spreadsheet at the vendor's headquarters. Ceph has enough institutional contributors that when one leaves — which has happened — the project continues. For infrastructure you plan to run for fifteen or twenty years, that matters.

Architectural versatility pays for itself. Object storage today, block tomorrow for virtualisation, file when needed, NVMe-oF when it becomes relevant. All on the same hardware, maintained by the same team. COS only does object well. Separating platforms by protocol doubles teams, procedures, and support contracts. For cases where Ceph runs as an NFS high-availability backend, we've documented the process: NFS high availability with Ceph Ganesha.

Operational transparency is its own kind of security. When something breaks in Ceph, you have the code. When something breaks in COS, you open a ticket and wait. For serious technical teams, the first is worth more than it appears in a feature comparison.

The important nuance

Open source is not free. It is different. What you save in licensing you spend in team hours — in-house or contracted. If you have neither the team nor a partner acting as its extension, the equation can reverse. That's why the operational question matters as much as the philosophical one: who operates this day to day?

Technical honesty

When IBM COS is the right answer

If we were open source absolutists we'd be selling smoke — and there's enough in this market already. COS is the correct choice for a fairly specific client profile.

Small operational teams with no deep SDS skills and no budget to hire them or outsource continuously. Ceph's learning curve is real. If the organisation can't absorb it, a packaged product like COS reduces the operational problem surface.

Regulated sectors with very specific compliance requirements — audited WORM, SEC 17a-4 retention, Compliance Enabled Vaults, NENR. IBM's ecosystem is very mature here and audits move faster when the entire stack is from one vendor with existing certifications.

Corporate "single throat to choke" policy with explicit preference for vendor tier-1. Some organisations — conservative banking, public sector, defence — where the CISO won't accept an architecture without a contractual SLA. Arguing with that policy from outside is a waste of time; the right move is helping the client choose the packaged product that fits best.

IBM ecosystem already deployed. If the client already has Spectrum Protect, Storage Defender, Fusion, Power, or Z, consolidating object storage within the same vendor makes operational and commercial sense.

Very large scale (high petabytes or exabytes) with predictable, stable workloads, where the operational simplicity of a mature product offsets the licensing cost. We've seen clients with more than an exabyte under IBM support for whom migration would be a three-year project worth tens of millions; in those cases the answer is to stay and optimise.

What doesn't justify staying on COS

Inertia, uninformed fear of open source, or taking the annual licensing line as a given without questioning it. Those we always question.

Most of the market

When upstream Ceph with a good partner is the answer

This is the scenario where we believe most of the market sits — even if it doesn't always know it.

Profiles where upstream Ceph wins clearly:

  • Client with a competent technical team in Linux and infrastructure, or willing to engage a continuous support partner.
  • Medium to large scale, from hundreds of TB to tens of PB, where commercial subscription starts to hurt the budget.
  • Need or intent to unify object, block, and file storage on the same platform.
  • Hardware refresh underway with no appetite for tying to a single vendor's certified list.
  • Native Kubernetes integration via Rook, if a cloud-native platform is on the roadmap.
  • A preference, simply, for being able to see what's under the hood.

Here we need to address a myth that has circulated for years: that Ceph is hard. It's half true. Ceph is complex — as any serious distributed system is — but it's not chaotic or unstable. The difference between a Ceph cluster that causes problems and one that runs for years without incidents is not in the software. It's in deployment design, placement group and balancer tuning, coherent hardware selection, monitoring, and having someone experienced who knows what to do when something unusual appears in ceph health detail. We have a dedicated article on the most common Ceph error and how to fix it.

The problem is not Ceph. The problem is deploying Ceph without expertise. That's a problem for any complex infrastructure, not a product defect.

The honest question. Not "can I handle Ceph on my own?" — but "do I have someone, in-house or contracted, who has my back?" If yes, upstream Ceph delivers the best cost-to-result ratio in the market. If no, find that someone before signing anything.

A well-operated Ceph cluster performs just as well with upstream support plus a competent partner as with an enterprise subscription. The real difference is who picks up the phone at three in the morning. If you're evaluating Ceph against lighter object storage alternatives, our Ceph vs MinIO 2026 article covers that in detail.

The middle option

IBM Storage Ceph: the middle option

We'll be more direct here, because this product gets written about with surprisingly little clarity.

IBM Storage Ceph is, technically, Ceph. The same Ceph you download from the project website. Packaged, tested, integrated with IBM-specific tooling, commercially supported with an SLA, and certified in several regulated environments. That is what you pay for. Technically you get nothing you couldn't have with upstream.

When it makes sense to pay for it:

  • Public or private procurement contracts that require a tier-1 vendor with contractual support, with no room for negotiation.
  • Organisations where internal purchasing policy mandates enterprise support without exception, and there is no way to qualify an external partner as a substitute.
  • Clients who already have an IBM ELA where adding Storage Ceph to the package is reasonable against list price.
  • Sectors with audits where the manufacturer's name on the invoice shortens the process.

When it's not worth it: in practically every other case. If your compliance doesn't require it and you have a decent partner, paying a subscription for upstream is an avoidable overhead. At tens of petabytes scale, the difference between a commercial subscription and a partner supporting upstream can be hundreds of thousands of euros per year. At exabyte scale, it moves to millions. For most clients, that money is better reinvested in team, hardware, or anything else.

Plain summary. IBM COS = complete product, single vendor, high cost, high lock-in, low operational complexity. IBM Storage Ceph = community Ceph with an IBM invoice, contractual reassurance, medium-high cost. Upstream Ceph with a partner = maximum control, low cost, requires maturity — in-house or borrowed.

If your reality pushes you toward the first or second, we'll be there to help you operate it well. But most clients we work with discover, after an honest assessment, that the third fits them better than they thought.

Real cases

Three real client cases

Anonymised, because NDAs are NDAs. The lesson is always the same: the right question is not "which is better in the abstract" but "which fits this specific operational reality".

Real-world cases · Three profiles, three different decisions
A
European telco operator · 50 PB · IBM COS → Ceph upstream
Cisco UCS M4 hardware at end of life; refresh on IBM-certified hardware was prohibitively expensive. COS licensing cost had been questioned internally for years. Strategic intent to consolidate object and block on a single stack for the internal Kubernetes platform. 18-month phased migration with dual-running for critical data. Outcome: significantly reduced total operational cost, client team fully autonomous, SIXE as second-level support. Three years on, the cluster remains stable.
Hardware EoLLicensing costK8s consolidation
B
Regulated financial institution · 8 PB · Stayed on IBM COS
Called us to evaluate a potential migration motivated by licensing cost. We ran the full assessment. Our recommendation was not to migrate: small operational team with no budget or culture to absorb Ceph autonomously, SEC 17a-4 Compliance Enabled Vault requirements deeply embedded in annual audits, and legitimately high aversion to operational risk. We earned less than a migration would have generated — and gained a long-term client. We continued working with them optimising the existing deployment and planning the next hardware refresh.
SEC 17a-4Small teamAnswer: stay
C
Public sector organisation · 3 PB · Self-managed Ceph → IBM Storage Ceph
Ceph deployed internally without sufficient expertise: unstable cluster, recurring incidents that had worn out the operational team. A new tender requirement mandated tier-1 vendor contractual support — upstream was off the table. We accompanied them through the migration to IBM Storage Ceph, environment stabilisation, and team training. They ended with a healthy cluster and peace of mind. Not the cheapest path, but the only viable one given the external constraints.
Tender: vendor tier-1Unstable cluster→ IBM Storage Ceph
What nobody tells you

What most comparisons don't tell you

Four things that never appear in vendor whitepapers and that we have seen trip up many technical teams.

Migrating at petabyte scale is not copying data

It's migrating configuration: lifecycle policies, retention, legal holds, ACLs, CORS, bucket policies, versioning, event notifications, tagging, replication. You migrate context as much as bytes. A poorly scoped migration project discovers this halfway through and finds its timeline has doubled.

The S3 dialect is not uniform

Between AWS S3, Ceph RGW, and IBM COS there are subtle differences in headers, LIST behaviour with large object counts, multipart upload edge cases, and versioning semantics. Client applications sometimes need adjustment. Test — don't assume.

Data protection philosophy changes between products

COS's IDA, Ceph's erasure coding, and traditional triple replication are not interchangeable in terms of durability guarantees or the failure profiles they tolerate. Translating a COS IDA 10/8/7 to a Ceph erasure coding profile requires judgment, not arithmetic.

Day-to-day operations are radically different

In COS you diagnose with storagectl list and the Manager administration shell. In Ceph with ceph -s, ceph osd tree, ceph health detail, placement groups, OSDs, CRUSH maps. Retraining a team takes six to twelve months of effective transition. Budget for it — it cannot be a project footnote.

How we work

How we work at SIXE

The approach is straightforward and has been working for years. First an assessment: we review the current architecture, actual workloads, the operational team's profile, regulatory constraints, the three-to-five-year budget, and the technically viable options. The output is a reasoned recommendation with alternatives — and sometimes it is "stay where you are". We have said that more than once.

Then a design, if there is migration or substantial change. Target architecture, phased plan, operational windows, risk matrix, runbooks. No two migrations are alike.

Then execution. Phased migration with dual-running where possible, data validation, functional QA with client applications, post-cutover tuning.

And finally handover with mentoring to the client team, plus ongoing Ceph technical support if they want us at the other end of the line going forward. Many clients prefer this model — SIXE as a team extension — over a commercial subscription. It is exactly what makes upstream Ceph viable in serious production environments. For teams that want to build internal capability, we offer a Ceph administration course and a practical IBM Storage Ceph course.

Our team diagnoses a DONT-START-DAEMON on a ClevOS slicestor with the same ease as a placement group inactive+incomplete on Ceph. We are not an "IBM partner" or a "Ceph partner". We are an object storage partner, and we know all three options well enough to recommend whichever one actually fits.


Running object storage that needs a review?

An honest technical conversation. No sales pitch.

Tell us about your current deployment — capacity, workloads, team, regulatory constraints. We'll tell you what makes sense. If the answer is "stay where you are", we'll say that too.

SIXE