Run an LLM on IBM i via PASE — No Linux Required

IBM i · March 2026

We Ran an LLM on IBM i. No Linux. No Cloud. No GPU.

llama.cpp compiled for AIX runs natively on IBM i via PASE. Your RPG programs can call a local language model without adding infrastructure or sending data anywhere.

March 20268 min read

If you manage an IBM i system, you know how this conversation goes. Someone asks about AI, and the answers are always the same: "spin up a Linux LPAR", "use OpenAI", "check out Wallaroo". Every option means leaving the platform, adding layers, and at some point sending business data to a server you don't control.

There are 150,000 IBM i systems processing transactions in banking, insurance, and healthcare. The answer can't always be "add more infrastructure". So we tried something different.

The experiment

What we actually did

We took llama.cpp — the most widely used open-source LLM inference engine — compiled it for AIX, and copied the binary to an IBM i V7R5 partition. We ran it via PASE. It worked on the first try.

$ uname -a
OS400 WWW 5 7 007800001B91

$ /QOpenSys/pkgs/bin/python3 -c "import platform; print(platform.platform())"
OS400-5-007800001B91-powerpc-64bit

$ /QOpenSys/pkgs/bin/python3 -c "import sys; print('Byte order:', sys.byteorder)"
Byte order: big

That's IBM i V7R5 on pub400.com — a public IBM i system. Big-endian, powerpc-64bit, OS400. Not Linux, not AIX. IBM i.

What kind of binary

$ file llama/llama-simple
llama/llama-simple: 64-bit XCOFF executable or object module

A 64-bit XCOFF binary — the native executable format for AIX. Compiled on AIX 7.3 POWER using GCC 13.3 with VSX vector extensions enabled. The same binary from our llama-aix project, which already ships 10 big-endian GGUF models on HuggingFace.

First run

$ LIBPATH=/home/HBSIXE/llama /home/HBSIXE/llama/llama-simple --help

example usage:

    /home/HBSIXE/llama/llama-simple -m model.gguf [-n n_predict] [prompt]

The binary loads, links libggml and libllama, parses arguments, and responds. All inside PASE. To run actual inference, you point it at a big-endian GGUF model:

$ LIBPATH=/home/HBSIXE/llama /home/HBSIXE/llama/llama-simple \
    -m models/tinyllama-1.1b-q4_k_m-be.gguf \
    -p "What is IBM i?" -n 100 -t 4
IBM i PASE terminal running llama.cpp: the XCOFF binary loads, links libraries and responds to a prompt in real time
The context

Why this matters for IBM i shops

In 2026, the AI conversation in the IBM i community is louder than ever. IBM just launched Bob (the successor to WCA for i), a coding assistant for RPG developers. 70% of IBM i customers plan hardware upgrades this year. And yet there's one question that still doesn't have a clean answer:

How do I integrate an LLM into my IBM i applications without depending on an external service?

The usual options, right now:

OptionWhat it meansThe catch
Linux LPARSpin up a separate partition, run the LLM there, call it from RPG via APINew hardware to manage, added cost, data crosses partition boundaries
Cloud APICall OpenAI, Azure, or AWS from RPGBusiness data leaves the machine. A serious problem in banking, insurance, and healthcare
WallarooOption 1 packaged as a service$500/month. Still a Linux LPAR with branding
PASE + llama.cppThe LLM runs inside IBM i itself, via PASENo extra hardware. Data never leaves the partition.
What about IBM Bob?
Bob is for the developer: it helps understand, document, and generate RPG code from the IDE. What we describe here is for the production application: an LLM running in the same partition that any RPG program can call like a local API. They solve different problems. Bob for the dev workflow, local inference for the apps themselves.
The technical foundation

PASE: the bridge you already have

PASE (Portable Application Solutions Environment) is a runtime built into IBM i that executes AIX binaries natively. It's not emulation — it's a layer that exposes AIX system calls directly on top of the IBM i kernel. If something runs on AIX, it can run on IBM i via PASE.

┌──────────────────────────────────────────┐ IBM i (OS400) │ ┌──────────────┐ ┌────────────────┐ │ │ │ RPG / CL │ │ PASE │ │ │ │ COBOL / Db2 │───→│ (AIX runtime) │ │ │ │ │ │ │ │ │ │ localhost │ │ llama-server │ │ │ │ :8080 │ │ + GGUF model │ │ │ └──────────────┘ └────────────────┘ │ IBM POWER Hardware └──────────────────────────────────────────┘

We've been building and shipping AIX packages through LibrePower's AIX repository for years — over 30 open-source packages installable via DNF. When llama.cpp joined the catalogue, testing the jump to IBM i was the natural next step. PASE handles the rest.

For IBM i administrators

You don't need to install anything special on the operating system. PASE is already active. All you need is the XCOFF binary of llama.cpp and a big-endian GGUF model. The LLM runs as a regular PASE process, without touching the native IBM i environment.

The technical hurdle

The big-endian problem (and how we solved it)

There's a reason nobody had done this cleanly before: byte order. IBM i and AIX are big-endian. Virtually all AI software — x86, ARM, Linux ppc64le — assumes little-endian. A GGUF file downloaded from HuggingFace won't load on IBM i: the bytes are in the wrong order.

We'd already solved this in our AIX work. The solution: convert the models before distributing them. We publish big-endian GGUF models at huggingface.co/librepowerai, validated on real AIX hardware and ready to load directly on IBM i PASE.

ModelSizeQuantization
TinyLlama 1.1B Chat668 MBQ4_K_M
LFM 1.2B Instruct695 MBQ4_K_M
LFM 1.2B Thinking731 MBQ4_K_M
7 more available

These are the same models that reach 10–12 tok/s on AIX POWER. On IBM i POWER10 — with MMA hardware acceleration active via OpenBLAS — performance should be comparable or better. Concrete IBM i benchmarks are in progress.

From PoC to production

From proof of concept to production

Running --help proves the binary loads. The real path to useful AI in your applications has three stages, and the first one is available right now.

Stage 1: Direct inference (available now)

From any SSH or QSH session on the IBM i:

# Direct inference from the command line
LIBPATH=/path/to/llama /path/to/llama/llama-simple \
    -m /path/to/model.gguf \
    -p "Summarize this purchase order" -n 200 -t 8

Useful for CL scripts, batch jobs, or just verifying that the model loads and responds correctly on your specific hardware before going further.

Stage 2: OpenAI-compatible API server (coming soon)

llama.cpp includes llama-server, which exposes an HTTP endpoint compatible with the OpenAI API. Once running in PASE, any RPG program can call it using QSYS2.HTTP_POST — exactly like any other API:

# Start the inference server on IBM i via PASE
LIBPATH=/path/to/llama /path/to/llama/llama-server \
    -m /path/to/model.gguf \
    --host 0.0.0.0 --port 8080 -t 8
// Call it from RPG — the LLM is on localhost
dcl-s url varchar(256) inz('http://localhost:8080/v1/chat/completions');
dcl-s body varchar(65535);
dcl-s response varchar(65535);
// QSYS2.HTTP_POST — no data leaves IBM i

The important part: localhost. The model is on the same machine. Data never leaves the partition.

Stage 3: Business application integration (in development)

  • Document analysis: feed Db2 reports to the LLM for automatic summarization
  • Natural language queries: the user types in plain English, the LLM returns SQL
  • RPG code modernization: the LLM analyzes and documents existing programs without leaving IBM i
  • Intelligent monitoring: analyze QSYSOPR messages and job logs with semantic context
A note on performance: small models (1–2B parameters) running in PASE are more than enough for classification, summarization, structured data extraction, and fixed-format responses. For longer text generation or complex reasoning, 7B+ models scale well with more threads. IBM i POWER10 benchmarks are in progress.
Hands-on

How to try it yourself

If you have access to an IBM i with PASE active, it's three steps.

1. Get the llama.cpp binary for AIX

Available on LibrePower's GitLab. If you have DNF/yum configured:

# From AIX (or via PASE if you have dnf)
dnf install llama-cpp

2. Download a big-endian model

curl -L -o tinyllama-be.gguf \
  "https://huggingface.co/librepowerai/TinyLlama-1.1B-Chat-v1.0-GGUF-big-endian/resolve/main/tinyllama-1.1b-q4_k_m-be.gguf"

TinyLlama is a solid starting point: 668 MB, fast to load, and enough to verify everything works before moving to larger models.

3. Run inference

LIBPATH=/path/to/llama ./llama-simple \
    -m tinyllama-be.gguf \
    -p "What is IBM i?" \
    -n 150 -t 4
IBM i in production?

SIXE has been supporting IBM i environments for years. If you want to understand whether this approach fits your architecture — or what it means for your RPG applications — get in touch. No strings attached.

Roadmap

What's next

This is a solid proof of concept, not a finished product. Here's what we're working on next:

  • llama-server on IBM i — the HTTP API server running in PASE, documented and packaged so you can get it running in minutes
  • RPG integration examples — real code for calling the LLM from RPG programs via QSYS2.HTTP_POST
  • IBM i POWER10/POWER11 benchmarks — real tok/s measurements with PASE on production hardware
  • Larger models — testing 7B+ models on partitions with enough memory
  • vLLM for IBM i — our vLLM package for ppc64le, adapted to run in PASE

More from LibrePower

ProjectWhat it does
llama-aixllama.cpp for AIX with 10 big-endian GGUF models ready to download
linux.librepower.orgAPT repository with vLLM for Linux ppc64le (Ubuntu/Debian)
aix.librepower.org30+ open-source packages for AIX, installable via DNF

Got IBM i with PASE?

Try the LLM on your own partition

The binary is on GitLab. The models are on HuggingFace. If you have PASE access and a few minutes, you can replicate exactly what we describe here :)

SIXE