Skip to contact form

Ceph, OpenStack & Kubernetes for on‑premise AI inference

Build your own AI inference infrastructure with a fully open source stack. No proprietary licenses, no cloud vendor lock‑in, full sovereignty over your data and models. We design the architecture, train your team with certified courses, and walk you through deployment to production.

Let's talk about your project Message us on WhatsApp
inference@your-datacenter ~
$ ceph -s | head -4

cluster: health HEALTH_OK

osd: 12 osds, 12 up · data: 2.4 TiB stored

$ kubectl get pods -n inference

vllm-llama70b-0 Running gpu: A100vllm-mistral-0 Running gpu: L40Striton-embed-0 Running gpu: L40S

$ openstack server list --project ai

| gpu-worker-01 | ACTIVE | nvidia-a100-80g || gpu-worker-02 | ACTIVE | nvidia-l40s |

$

P99 Latency

47ms

Throughput

3.2k tok/s

Ceph OpenStack Kubernetes Canonical / Ubuntu IBM Ceph OpenStack Kubernetes Canonical / Ubuntu IBM Ceph OpenStack Kubernetes Canonical / Ubuntu IBM
Why on-premise

Three reasons organisations are bringing AI inference in-house

01

Predictable costs

Cloud GPU bills swing 30-40% between billing cycles. With your own infrastructure, costs are fixed, depreciable and predictable. Every token you generate gets cheaper over time.

02

Zero vendor lock-in

Proprietary APIs, closed formats, captive orchestration. Your fine-tuned models and curated datasets live on someone else's infrastructure. With open source, you can always move everything.

03

Regulatory compliance

GDPR and the EU AI Act require you to know where your data is processed and who has access. Running inference on sensitive data in public cloud is a regulatory risk that grows every quarter.

Modern datacenter interior
The AI inference stack

Three battle-tested technologies. Zero proprietary dependencies.

The same open source stack powering AI Factories at the Barcelona Supercomputing Center and sovereign AI infrastructure across Europe. We configure it in your datacenter and train your team to operate it.

Ceph logo

Ceph

DISTRIBUTED STORAGE

Unified object, block and file storage. Store model weights (tens of GB each), massive datasets and inference results. S3-compatible. Scales from terabytes to petabytes with zero downtime.

S3 APIRBDCephFSErasure coding
OpenStack logo

OpenStack

INFRASTRUCTURE ORCHESTRATION

Your enterprise private cloud. Full GPU management with PCI passthrough, vGPU and NVIDIA MIG. Isolated networks per project, automated provisioning and integrated bare metal management.

NovaNeutronIronicSenlin
Kubernetes logo

Kubernetes

INFERENCE ORCHESTRATION

Native GPU scheduling, inference pod autoscaling, vLLM and TensorRT-LLM deployment in containers. The CNCF-certified standard for running AI workloads in production at any scale.

GPU OperatorKubeFlowvLLMTriton

Reference architecture

01 — DATA

Ceph S3

Models · Datasets

02 — INFRA

OpenStack

GPU · Network · Bare metal

03 — ORCHESTRATION

Kubernetes

vLLM · Triton · KubeFlow

04 — PRODUCTION

Inference

APIs · Agents · RAG

0

Vendor lock-in

7%

More GPU efficiency vs VMware

Source: FPT / OpenInfra, 2025

~50%

Storage savings vs cloud

Source: OpenMetal, 2025

100%

GDPR & EU AI Act compliant

What's included

From your datacenter to serving models in production

We don't sell hardware or tie you to maintenance contracts. We transfer the knowledge so your team becomes fully autonomous.

Assessment & architecture design

We audit your workloads, latency requirements, data volumes and regulatory obligations. Deliverable: full architecture design including GPU node sizing, network topology, Ceph storage strategy and a 12-24 month capacity plan.

Certified training for your team

Hands-on courses in Ceph administration, OpenStack for GPU workloads, and Kubernetes with accelerator scheduling. As IBM Business Partner and Canonical Partner, our certifications carry international weight.

Assisted deployment in your environment

Hands-on installation: Ceph clusters, OpenStack with native GPU support (PCI passthrough, vGPU, MIG), Kubernetes with NVIDIA GPU Operator, and first real inference workloads with vLLM or TensorRT-LLM serving production traffic.

Ongoing support & evolution

GPU performance tuning, stack upgrades and advanced training as you scale. From your first self-hosted LLM to a full agentic AI platform with RAG, multiple models and production APIs.

Target sectors

Built for organisations where data cannot leave the building

Healthcare

Patient records, AI-assisted diagnostics, clinical compliance.

Banking & insurance

Real-time fraud detection, credit scoring, EBA & ECB-regulated data.

Government & defence

Digital sovereignty, AI for public services, EU AI Act, classified data.

Manufacturing

Machine vision, predictive maintenance, edge inference in OT environments.

Who's behind this

A technical partner that trains you to not need them

We're not a hyperscaler or a hardware vendor. We're an IT training consultancy with over a decade deploying open source in production. Our job is done when your team runs the whole thing on their own.

IBM

IBM Business Partner

Official training in IBM Power, Storage and AI technologies with internationally recognised certifications.

Canonical

Canonical Partner

Ubuntu, Ceph, OpenStack, MicroK8s and Juju. The Canonical ecosystem as the foundation of our open source infrastructure.

🌍

European & multilingual

We operate in English, Spanish and French. GDPR and EU AI Act expertise built in.

Frequently asked questions

What people ask before getting started

What hardware do I need for on-premise AI inference? +

It depends on the models you need to serve. For models up to 70B parameters, a minimum of 2-3 servers with NVIDIA A100 GPUs (80 GB VRAM) or L40S is a solid starting point. For storage, we recommend at least 3 nodes with NVMe drives for the Ceph cluster. During the assessment phase we size the exact configuration based on your models, target latency and request volume.

How much does it cost to build a Ceph + OpenStack + Kubernetes cluster? +

The software is 100% open source, so there are no licensing costs. The investment depends on hardware (GPUs, servers, networking) and project scope. A minimum viable configuration for inference can start from 3-5 nodes. Our service includes assessment, architecture design and assisted deployment. We can also train your team. Get in touch for a tailored proposal.

Can I use this stack for training, or only for inference? +

The Ceph + OpenStack + Kubernetes stack supports inference, fine-tuning and training. However, large-scale pre-training requires GPU clusters interconnected with high-speed fabrics (InfiniBand/RoCE). Most organisations use this infrastructure for inference and fine-tuning, and rely on cloud or supercomputing for pre-training.

How does this compare to using GPUs on a public cloud like AWS or Azure? +

Three key differences: cost (fixed and depreciable vs variable and growing), data sovereignty (your data never leaves your datacenter), and zero vendor lock-in (the entire stack is open source and portable). At medium inference volumes, on-premise infrastructure typically pays for itself within 12-18 months versus cloud.

Do I need a specialised team to operate the infrastructure? +

Our service includes certified training for your team in Ceph, OpenStack and Kubernetes administration. If your team has Linux and networking experience, they can operate the infrastructure after the training. We also offer ongoing support during the initial phases until your team is fully autonomous.

Is this infrastructure compliant with GDPR and the EU AI Act? +

Yes. Because it's on-premise, you have full control over where data is stored and processed. There are no cross-border transfers or dependency on external cloud providers. This greatly simplifies compliance with GDPR, the EU AI Act, and sector-specific regulations like the EBA guidelines for banking.

Next step

Got an AI inference project? Tell us the details.

We'll review your technical requirements, data volumes and compliance constraints. A straight conversation between professionals to see if it makes sense to work together.

Contact SIXE Prefer WhatsApp?