⚡ Advanced Training

Ceph Production Operations | Course

When a 200TB cluster crashes at 3 AM, you need answers —not theory

3 DAYS

Intensive

100%

Hands-on

REAL

Scenarios

🐧

Distribution-agnostic

IBM Storage Ceph, Red Hat, Ubuntu, Rocky, Alma Linux, or upstream Ceph

⚠️

3:00 AM

CLUSTER CRITICAL

💥

OSD Failure

12 OSDs down

📁

CephFS

Metadata corrupt

⚡

Performance

IOPS -80%

🔧

Recovery

Plan active

🎯 You'll learn to solve:

✓ Critical failures in 200TB+ clusters

✓ Recovery of 40TB corrupted CephFS

✓ Extreme tuning for AI/ML (500TB/day)

✓ Troubleshooting under 24/7 pressure

👥 Who is this for?

Certified administrators or those with production experience who need to master real-world critical scenarios that vendors don't teach

📚

Course Structure

An intensive 3-day program designed to tackle real-world crises and optimize production clusters at petabyte scale

Advanced Performance Engineering & Forensics

From architecture to forensic troubleshooting in production

☀️

Morning: Architectural Optimization

• BlueStore internals: RocksDB tuning, compaction, write amplification
• CPU optimization: C-states impact (labs showing 5x degradation), NUMA
• Network: 100GbE patterns, TCP tuning, nf_conntrack
• NVMe-specific: reactor tuning, bdevs_per_cluster optimization

🌅

Afternoon: Forensic Troubleshooting

• Diagnostic toolchain: blktrace, perf, objectstore-tool
• Real case studies: NVMe degradation, post-upgrade OSD flapping
• Advanced PG lifecycle: stuck states, manual intervention
• Labs: Cluster with real problems to diagnose

Disaster Recovery, Multi-Site & Petabyte Scaling

Extreme recovery and multi-site architectures

☀️

Morning: Advanced DR

• Edinburgh 40TB case: complete error chain and recovery procedures
• CephFS disasters: metadata corruption, MDS failure handling
• RBD mirroring: pool vs image-based, failover automation
• Physical DR: disk extraction, journal, whoami preservation

🌅

Afternoon: Multi-Site & Petabytes

• RGW multisite: master zone failure, manual promotion, sync fairness
• WAN planning: formulas for 1 GbE per 8TB daily ingest
• Petabyte challenges: CERN 30PB (7,200 OSDs), 310M objects
• Labs: Multi-site failover and recovery simulation

Security, AI/ML Workloads & Cost Engineering

Enterprise security and optimization for modern workloads

🔒

Morning: Security Hardening

• Encryption: LUKS/dmcrypt OSDs, msgr2 secure, RGW SSE-S3/KMS
• Key management: rotation (Squid 19.2.3+), Barbican integration
• Compliance: HIPAA architecture, GDPR, audit logging
• Threat detection: monitoring patterns, vulnerability management

🤖

Afternoon: AI/ML & ROI Engineering

• S3 Select: Trino integration (2.5x-9x performance), analytics pushdown
• AI/ML patterns: checkpointing, parallel access optimization
• TCO analysis: EC efficiency, commodity hardware savings
• Hybrid architectures: OpenStack DCN, edge-to-core, multi-cloud

🧪

Lab Specifications

Realistic enterprise cloud infrastructure

🖥️ Infrastructure

• Real 5-6 node cluster
• 500GB+ pre-populated data per student
• 24/7 access for 7+ days post-course

⚠️ Real Scenarios

• Disk failures & network partitions
• Simulated metadata corruption
• Injected performance degradation

🔧 Tools

• blktrace, perf, objectstore-tool
• Pre-installed debugging symbols
• Real datasets with I/O patterns

🐧 Supported Distributions and Versions

Available distributions:

✓ Rocky Linux 9
✓ Ubuntu 24.04 LTS
✓ Red Hat Enterprise Linux

Ceph versions:

✓ Upstream Squid 19.2+
✓ IBM Storage Ceph 7.1
✓ Red Hat Ceph Storage 7.x

📅

Upcoming Sessions

Intensive 3-day training designed for small groups (maximum 10 participants)

to maximize interaction and collaborative troubleshooting

🏢

In-Person

At our facilities with full access to labs and specialized equipment

Hands-on Experience

🚀

On-Site

At your organization for teams of 4+ people with customized configuration

Team Training

🌐

Remote

With dedicated cloud lab and full access to real-time practice resources

Cloud Access

💪

Ready to handle critical scenarios with confidence?

Request information about upcoming dates, detailed curriculum, and terms. ⚡ Response guaranteed within 24 hours

Or call us directly to answer your questions

📞

+34 91 198 02 43

📧 Email Support

💬 Live Chat

📅 Flexible Scheduling