Porting MariaDB to IBM AIX (Part 1): 3 Weeks of Engineering Pain

Bringing MariaDB to AIX, the Platform That Powers the World’s Most Critical Systems

There are decisions in life you make knowing full well they’ll cause you some pain. Getting married. Having children. Running a marathon. Porting MariaDB 11.8 to IBM AIX.

This (Part 1) is the story of the last one — and why I’d do it again in a heartbeat.

Chapter 1: “How Hard Can It Be?”

It all started with an innocent question during a team meeting: “Why don’t we have MariaDB on our AIX systems?”

Here’s the thing about AIX that people who’ve never worked with it don’t understand: AIX doesn’t mess around. When banks need five-nines uptime for their core banking systems, they run AIX. When airlines need reservation systems that cannot fail, they run AIX. When Oracle, Informix, or DB2 need to deliver absolutely brutal performance for mission-critical OLTP workloads, they run on AIX.

AIX isn’t trendy. AIX doesn’t have a cool mascot. AIX won’t be the subject of breathless tech blog posts about “disruption.” But when things absolutely, positively cannot fail — AIX is there, quietly doing its job while everyone else is busy rebooting their containers.

So why doesn’t MariaDB officially support AIX? Simple economics: the open source community has centered on Linux, and porting requires platform-specific expertise. MariaDB officially supports Linux, Windows, FreeBSD, macOS, and Solaris. AIX isn’t on the list — not because it’s a bad platform, but because no one had done the work yet.

At LibrePower, that’s exactly what we do.

My first mistake was saying out loud: “It’s probably just a matter of compiling it and adjusting a few things.”

Lesson #1: When someone says “just compile it” about software on AIX, they’re about to learn a lot about systems programming.

Chapter 2: CMake and the Three Unexpected Guests

Day one of compilation was… educational. CMake on AIX is like playing cards with someone who has a very different understanding of the rules — and expects you to figure them out yourself.

The Ghost Function Bug

AIX has an interesting characteristic: it declares functions in headers for compatibility even when those functions don’t actually exist at runtime. It’s like your GPS saying “turn right in 200 meters” but the street is a brick wall.

CMake does a CHECK_C_SOURCE_COMPILES to test if pthread_threadid_np() exists. The code compiles. CMake says “great, we have it!” The binary starts and… BOOM. Symbol not found.

Turns out pthread_threadid_np() is macOS-only. AIX declares it in headers because… well, I’m still not entirely sure. Maybe for some POSIX compatibility reason that made sense decades ago? Whatever the reason, GCC compiles it happily, and the linker doesn’t complain until runtime.

Same story with getthrid(), which is OpenBSD-specific.

The fix:

IF(NOT CMAKE_SYSTEM_NAME MATCHES "AIX")
  CHECK_C_SOURCE_COMPILES("..." HAVE_PTHREAD_THREADID_NP)
ELSE()
  SET(HAVE_PTHREAD_THREADID_NP 0)  # Trust but verify... okay, just verify
ENDIF()

poll.h: Hide and Seek

AIX has <sys/poll.h>. It’s right there. You can cat it. But CMake doesn’t detect it.

After three hours debugging a “POLLIN undeclared” error in viosocket.c, I discovered the solution was simply forcing the define:

cmake ... -DHAVE_SYS_POLL_H=1

Three hours. For one flag.

(To be fair, this is a CMake platform detection issue, not an AIX issue. CMake’s checks assume Linux-style header layouts.)

The Cursed Plugins

At 98% compilation — 98%! — the wsrep_info plugin exploded with undefined symbols. Because it depends on Galera. Which we’re not using. But CMake compiles it anyway.

Also S3 (requires Aria symbols), Mroonga (requires Groonga), and RocksDB (deeply tied to Linux-specific optimizations).

Final CMake configuration:

-DPLUGIN_MROONGA=NO -DPLUGIN_ROCKSDB=NO -DPLUGIN_SPIDER=NO 
-DPLUGIN_TOKUDB=NO -DPLUGIN_OQGRAPH=NO -DPLUGIN_S3=NO -DPLUGIN_WSREP_INFO=NO

It looks like surgical amputation, but it’s actually just trimming the fat. These plugins are edge cases that few deployments need.

Chapter 3: Thread Pool, or How I Learned to Stop Worrying and Love the Mutex

This is where things got interesting. And by “interesting” I mean “I nearly gave myself a permanent twitch.”

MariaDB has two connection handling modes:

  • one-thread-per-connection: One thread per client. Simple. Scales like a car going uphill.
  • pool-of-threads: A fixed pool of threads handles all connections. Elegant. Efficient. And not available on AIX.

Why? Because the thread pool requires platform-specific I/O multiplexing APIs:

PlatformAPIStatus
LinuxepollSupported
FreeBSD/macOSkqueueSupported
Solarisevent portsSupported
WindowsIOCPSupported
AIXpollsetNot supported (until now)

So… how hard can implementing pollset support be?

(Editor’s note: At this point the author required a 20-minute break and a beverage)

The ONESHOT Problem

Linux epoll has a wonderful flag called EPOLLONESHOT. It guarantees that a file descriptor fires events only once until you explicitly re-arm it. This prevents two threads from processing the same connection simultaneously.

AIX pollset is level-triggered. Only level-triggered. No options. If data is available, it reports it. Again and again and again. Like a helpful colleague who keeps reminding you about that email you haven’t answered yet.

Eleven Versions of Increasing Wisdom

What followed were eleven iterations of code, each more elaborate than the last, trying to simulate ONESHOT behavior:

v1-v5 (The Age of Innocence)

I tried modifying event flags with PS_MOD. “If I change the event to 0, it’ll stop firing,” I thought. Spoiler: it didn’t stop firing.

v6-v7 (The State Machine Era)

“I know! I’ll maintain internal state and filter duplicate events.” The problem: there’s a time window between the kernel giving you the event and you updating your state. In that window, another thread can receive the same event.

v8-v9 (The Denial Phase)

“I’ll set the state to PENDING before processing.” It worked… sort of… until it didn’t.

v10 (Hope)

Finally found the solution: PS_DELETE + PS_ADD. When you receive an event, immediately delete the fd from the pollset. When you’re ready for more data, add it back.

// On receiving events: REMOVE
for (i = 0; i < ret; i++) {
    pctl.cmd = PS_DELETE;
    pctl.fd = native_events[i].fd;
    pollset_ctl(pollfd, &pctl, 1);
}

// When ready: ADD
pce.command = PS_ADD;
pollset_ctl_ext(pollfd, &pce, 1);

It worked! With -O2.

With -O3segfault.

The Dark Night of the Soul (The -O3 Bug)

Picture my face. I have code working perfectly with -O2. I enable -O3 for production benchmarks and the server crashes with “Got packets out of order” or a segfault in CONNECT::create_thd().

I spent two days thinking it was a compiler bug. GCC 13.3.0 on AIX. I blamed the compiler. I blamed the linker. I blamed everything except my own code.

The problem was subtler: MariaDB has two concurrent code paths calling io_poll_wait on the same pollset:

  • The listener blocks with timeout=-1
  • The worker calls with timeout=0 for non-blocking checks

With -O2, the timing was such that these rarely collided. With -O3, the code was faster, collisions happened more often, and boom — race condition.

v11 (Enlightenment)

The fix was a dedicated mutex protecting both pollset_poll and all pollset_ctl operations:

static pthread_mutex_t pollset_mutex = PTHREAD_MUTEX_INITIALIZER;

int io_poll_wait(...) {
    pthread_mutex_lock(&pollset_mutex);
    ret = pollset_poll(pollfd, native_events, max_events, timeout);
    // ... process and delete events ...
    pthread_mutex_unlock(&pollset_mutex);
}

Yes, it serializes pollset access. Yes, that’s theoretically slower. But you know what’s even slower? A server that crashes.

The final v11 code passed 72 hours of stress testing with 1,000 concurrent connections. Zero crashes. Zero memory leaks. Zero “packets out of order.”

Chapter 4: The -blibpath Thing (Actually a Feature)

One genuine AIX characteristic: you need to explicitly specify the library path at link time with -Wl,-blibpath:/your/path. If you don’t, the binary won’t find libstdc++ even if it’s in the same directory.

At first this seems annoying. Then you realize: AIX prefers explicit, deterministic paths over implicit searches. In production environments where “it worked on my machine” isn’t acceptable, that’s a feature, not a bug.

Chapter 5: Stability — The Numbers That Matter

After all this work, where do we actually stand?

The RPM is published at aix.librepower.org and deployed on an IBM POWER9 system (12 cores, SMT-8). MariaDB 11.8.5 runs on AIX 7.3 with thread pool enabled. The server passed a brutal QA suite:

TestResult
100 concurrent connections
500 concurrent connections
1,000 connections
30 minutes sustained load
11+ million queries
Memory leaksZERO

1,648,482,400 bytes of memory — constant across 30 minutes. Not a single byte of drift. The server ran for 39 minutes under continuous load and performed a clean shutdown.

It works. It’s stable. It’s production-ready for functionality.

Thread Pool Impact

The thread pool work delivered massive gains for concurrent workloads:

ConfigurationMixed 100 clientsvs. Baseline
Original -O2 one-thread-per-connection11.34s
-O3 + pool-of-threads v111.96s83% faster

For high-concurrency OLTP workloads, this is the difference between “struggling” and “flying.”

What I Learned (So Far)

  1. CMake assumes Linux. On non-Linux systems, manually verify that feature detection is correct. False positives will bite you at runtime.
  2. Level-triggered I/O requires discipline. EPOLLONESHOT exists for a reason. If your system doesn’t have it, prepare to implement your own serialization.
  3. -O3 exposes latent bugs. If your code “works with -O2 but not -O3,” you have a race condition. The compiler is doing its job; the bug is yours.
  4. Mutexes are your friend. Yes, they have overhead. But you know what has more overhead? Debugging race conditions at 3 AM.
  5. AIX rewards deep understanding. It’s a system that doesn’t forgive shortcuts, but once you understand its conventions, it’s predictable and robust. There’s a reason banks still run it — and will continue to for the foreseeable future.
  6. The ecosystem matters. Projects like linux-compat from LibrePower make modern development viable on AIX. Contributing to that ecosystem benefits everyone.

What’s Next: The Performance Question

The server is stable. The thread pool works. But there’s a question hanging in the air that I haven’t answered yet:

How fast is it compared to Linux?

I ran a vector search benchmark — the kind of operation that powers AI-enhanced search features. MariaDB’s MHNSW (Hierarchical Navigable Small World) index, 100,000 vectors, 768 dimensions.

Linux on identical POWER9 hardware: 971 queries per second.

AIX with our new build: 42 queries per second.

Twenty-three times slower.

My heart sank. Three weeks of work, and we’re 23x slower than Linux? On identical hardware?

But here’s the thing about engineering: when numbers don’t make sense, there’s always a reason. And sometimes that reason turns out to be surprisingly good news.

In Part 2, I’ll cover:

  • How we discovered the 23x gap was mostly a configuration mistake
  • The compiler that changed everything
  • Why “AIX is slow” turned out to be a myth
  • The complete “Failure Museum” of optimizations that didn’t work

The RPMs are published at aix.librepower.org. The GCC build is stable and production-ready for functionality.

But the performance story? That’s where things get really interesting.

Part 2 coming soon.

TL;DR

  • MariaDB 11.8.5 now runs on AIX 7.3 with thread pool enabled
  • First-ever thread pool implementation for AIX using pollset (11 iterations to get ONESHOT simulation right)
  • Server is stable: 1,000 connections, 11M+ queries, zero memory leaks
  • Thread pool delivers 83% improvement for concurrent workloads
  • Initial vector search benchmark shows 23x gap vs Linux — but is that the whole story?
  • RPMs published at aix.librepower.org
  • Part 2 coming soon: The performance investigation

Questions? Ideas? Want to contribute to the AIX open source ecosystem?

This work is part of LibrePower – Unlocking IBM Power Systems through open source. Unmatched RAS. Superior TCO. Minimal footprint 🌍

LibrePower AIX project repository: gitlab.com/librepower/aix

SIXE