Can you run AI on IBM i without adding a Linux LPAR?

Yes. PASE (Portable Application Solutions Environment) lets IBM i run AIX binaries natively. LibrePower compiled llama.cpp for AIX as an XCOFF binary and ran it on an IBM i V7R5 partition with no Linux LPAR. Data stays on the partition and RPG programs can call the LLM via a local API on localhost.

Why don't standard GGUF models from HuggingFace work on IBM i?

IBM i and AIX are big-endian, while almost all AI software assumes little-endian byte order (x86, ARM, Linux ppc64le). Standard GGUF files have their bytes in the wrong order for IBM i. LibrePower publishes big-endian GGUF models at huggingface.co/librepowerai, ready to use in IBM i PASE without any conversion.

What's the advantage of running an LLM in PASE vs. a cloud API?

Business data never leaves the IBM i partition. No per-request cost, no connectivity dependency, no privacy risk. RPG, CL, or Python programs call the LLM like any other local API at localhost. This matters especially in banking, healthcare, and insurance environments where data residency is non-negotiable.

How does this compare to IBM Bob (WCA for i)?

IBM Bob is a cloud-based coding assistant for RPG developers, focused on code modernization. What we describe here is local LLM inference running inside the IBM i partition via PASE, with no dependency on any external service. They're complementary: Bob for the developer workflow, local inference for production applications.

Run an LLM on IBM i via PASE — No Linux Required

IBM i · March 2026

We Ran an LLM on IBM i. No Linux. No Cloud. No GPU.

llama.cpp compiled for AIX runs natively on IBM i via PASE. Your RPG programs can call a local language model without adding infrastructure or sending data anywhere.

March 2026●8 min read

If you manage an IBM i system, you know how this conversation goes. Someone asks about AI, and the answers are always the same: "spin up a Linux LPAR", "use OpenAI", "check out Wallaroo". Every option means leaving the platform, adding layers, and at some point sending business data to a server you don't control.

There are 150,000 IBM i systems processing transactions in banking, insurance, and healthcare. The answer can't always be "add more infrastructure". So we tried something different.

The experiment

What we actually did

We took llama.cpp — the most widely used open-source LLM inference engine — compiled it for AIX, and copied the binary to an IBM i V7R5 partition. We ran it via PASE. It worked on the first try.

$ uname -a
OS400 WWW 5 7 007800001B91

$ /QOpenSys/pkgs/bin/python3 -c "import platform; print(platform.platform())"
OS400-5-007800001B91-powerpc-64bit

$ /QOpenSys/pkgs/bin/python3 -c "import sys; print('Byte order:', sys.byteorder)"
Byte order: big

That's IBM i V7R5 on pub400.com — a public IBM i system. Big-endian, powerpc-64bit, OS400. Not Linux, not AIX. IBM i.

What kind of binary

$ file llama/llama-simple
llama/llama-simple: 64-bit XCOFF executable or object module

A 64-bit XCOFF binary — the native executable format for AIX. Compiled on AIX 7.3 POWER using GCC 13.3 with VSX vector extensions enabled. The same binary from our llama-aix project, which already ships 10 big-endian GGUF models on HuggingFace.

First run

$ LIBPATH=/home/HBSIXE/llama /home/HBSIXE/llama/llama-simple --help

example usage:

    /home/HBSIXE/llama/llama-simple -m model.gguf [-n n_predict] [prompt]

The binary loads, links libggml and libllama, parses arguments, and responds. All inside PASE. To run actual inference, you point it at a big-endian GGUF model:

$ LIBPATH=/home/HBSIXE/llama /home/HBSIXE/llama/llama-simple \
    -m models/tinyllama-1.1b-q4_k_m-be.gguf \
    -p "What is IBM i?" -n 100 -t 4

IBM i PASE terminal running llama.cpp: the XCOFF binary loads, links libraries and responds to a prompt in real time

The context

Why this matters for IBM i shops

In 2026, the AI conversation in the IBM i community is louder than ever. IBM just launched Bob (the successor to WCA for i), a coding assistant for RPG developers. 70% of IBM i customers plan hardware upgrades this year. And yet there's one question that still doesn't have a clean answer:

How do I integrate an LLM into my IBM i applications without depending on an external service?

The usual options, right now:

Option	What it means	The catch
Linux LPAR	Spin up a separate partition, run the LLM there, call it from RPG via API	New hardware to manage, added cost, data crosses partition boundaries
Cloud API	Call OpenAI, Azure, or AWS from RPG	Business data leaves the machine. A serious problem in banking, insurance, and healthcare
Wallaroo	Option 1 packaged as a service	$500/month. Still a Linux LPAR with branding
PASE + llama.cpp	The LLM runs inside IBM i itself, via PASE	No extra hardware. Data never leaves the partition.

What about IBM Bob?
Bob is for the developer: it helps understand, document, and generate RPG code from the IDE. What we describe here is for the production application: an LLM running in the same partition that any RPG program can call like a local API. They solve different problems. Bob for the dev workflow, local inference for the apps themselves.

The technical foundation

PASE: the bridge you already have

PASE (Portable Application Solutions Environment) is a runtime built into IBM i that executes AIX binaries natively. It's not emulation — it's a layer that exposes AIX system calls directly on top of the IBM i kernel. If something runs on AIX, it can run on IBM i via PASE.

┌──────────────────────────────────────────┐ │ IBM i (OS400) │ │ │ │ ┌──────────────┐ ┌────────────────┐ │ │ │ RPG / CL │ │ PASE │ │ │ │ COBOL / Db2 │───→│ (AIX runtime) │ │ │ │ │ │ │ │ │ │ localhost │ │ llama-server │ │ │ │ :8080 │ │ + GGUF model │ │ │ └──────────────┘ └────────────────┘ │ │ │ │ IBM POWER Hardware │ └──────────────────────────────────────────┘

We've been building and shipping AIX packages through LibrePower's AIX repository for years — over 30 open-source packages installable via DNF. When llama.cpp joined the catalogue, testing the jump to IBM i was the natural next step. PASE handles the rest.

For IBM i administrators

You don't need to install anything special on the operating system. PASE is already active. All you need is the XCOFF binary of llama.cpp and a big-endian GGUF model. The LLM runs as a regular PASE process, without touching the native IBM i environment.

The technical hurdle

The big-endian problem (and how we solved it)

There's a reason nobody had done this cleanly before: byte order. IBM i and AIX are big-endian. Virtually all AI software — x86, ARM, Linux ppc64le — assumes little-endian. A GGUF file downloaded from HuggingFace won't load on IBM i: the bytes are in the wrong order.

We'd already solved this in our AIX work. The solution: convert the models before distributing them. We publish big-endian GGUF models at huggingface.co/librepowerai, validated on real AIX hardware and ready to load directly on IBM i PASE.

Model	Size	Quantization
TinyLlama 1.1B Chat	668 MB	Q4_K_M
LFM 1.2B Instruct	695 MB	Q4_K_M
LFM 1.2B Thinking	731 MB	Q4_K_M
7 more available	—	—

These are the same models that reach 10–12 tok/s on AIX POWER. On IBM i POWER10 — with MMA hardware acceleration active via OpenBLAS — performance should be comparable or better. Concrete IBM i benchmarks are in progress.

From PoC to production

From proof of concept to production

Running --help proves the binary loads. The real path to useful AI in your applications has three stages, and the first one is available right now.

Stage 1: Direct inference (available now)

From any SSH or QSH session on the IBM i:

# Direct inference from the command line
LIBPATH=/path/to/llama /path/to/llama/llama-simple \
    -m /path/to/model.gguf \
    -p "Summarize this purchase order" -n 200 -t 8

Useful for CL scripts, batch jobs, or just verifying that the model loads and responds correctly on your specific hardware before going further.

Stage 2: OpenAI-compatible API server (coming soon)

llama.cpp includes llama-server, which exposes an HTTP endpoint compatible with the OpenAI API. Once running in PASE, any RPG program can call it using QSYS2.HTTP_POST — exactly like any other API:

# Start the inference server on IBM i via PASE
LIBPATH=/path/to/llama /path/to/llama/llama-server \
    -m /path/to/model.gguf \
    --host 0.0.0.0 --port 8080 -t 8

// Call it from RPG — the LLM is on localhost
dcl-s url varchar(256) inz('http://localhost:8080/v1/chat/completions');
dcl-s body varchar(65535);
dcl-s response varchar(65535);
// QSYS2.HTTP_POST — no data leaves IBM i

The important part: localhost. The model is on the same machine. Data never leaves the partition.

Stage 3: Business application integration (in development)

Document analysis: feed Db2 reports to the LLM for automatic summarization
Natural language queries: the user types in plain English, the LLM returns SQL
RPG code modernization: the LLM analyzes and documents existing programs without leaving IBM i
Intelligent monitoring: analyze QSYSOPR messages and job logs with semantic context

A note on performance: small models (1–2B parameters) running in PASE are more than enough for classification, summarization, structured data extraction, and fixed-format responses. For longer text generation or complex reasoning, 7B+ models scale well with more threads. IBM i POWER10 benchmarks are in progress.

Hands-on

How to try it yourself

If you have access to an IBM i with PASE active, it's three steps.

1. Get the llama.cpp binary for AIX

Available on LibrePower's GitLab. If you have DNF/yum configured:

# From AIX (or via PASE if you have dnf)
dnf install llama-cpp

2. Download a big-endian model

curl -L -o tinyllama-be.gguf \
  "https://huggingface.co/librepowerai/TinyLlama-1.1B-Chat-v1.0-GGUF-big-endian/resolve/main/tinyllama-1.1b-q4_k_m-be.gguf"

TinyLlama is a solid starting point: 668 MB, fast to load, and enough to verify everything works before moving to larger models.

3. Run inference

LIBPATH=/path/to/llama ./llama-simple \
    -m tinyllama-be.gguf \
    -p "What is IBM i?" \
    -n 150 -t 4

IBM i in production?

SIXE has been supporting IBM i environments for years. If you want to understand whether this approach fits your architecture — or what it means for your RPG applications — get in touch. No strings attached.

Roadmap

What's next

This is a solid proof of concept, not a finished product. Here's what we're working on next:

llama-server on IBM i — the HTTP API server running in PASE, documented and packaged so you can get it running in minutes
RPG integration examples — real code for calling the LLM from RPG programs via QSYS2.HTTP_POST
IBM i POWER10/POWER11 benchmarks — real tok/s measurements with PASE on production hardware
Larger models — testing 7B+ models on partitions with enough memory
vLLM for IBM i — our vLLM package for ppc64le, adapted to run in PASE

More from LibrePower

Project	What it does
llama-aix	llama.cpp for AIX with 10 big-endian GGUF models ready to download
linux.librepower.org	APT repository with vLLM for Linux ppc64le (Ubuntu/Debian)
aix.librepower.org	30+ open-source packages for AIX, installable via DNF

Got IBM i with PASE?

Try the LLM on your own partition

The binary is on GitLab. The models are on HuggingFace. If you have PASE access and a few minutes, you can replicate exactly what we describe here :)

GitLab — llama-aix IBM i support — SIXE →

LibrePower Big-endian models Newsletter

Run an LLM on IBM i via PASE — No Linux Required

We Ran an LLM on IBM i. No Linux. No Cloud. No GPU.

What we actually did

What kind of binary

First run

Why this matters for IBM i shops

PASE: the bridge you already have

The big-endian problem (and how we solved it)

From proof of concept to production

Stage 1: Direct inference (available now)

Stage 2: OpenAI-compatible API server (coming soon)

Stage 3: Business application integration (in development)

How to try it yourself

1. Get the llama.cpp binary for AIX

2. Download a big-endian model

3. Run inference

What's next

More from LibrePower

Try the LLM on your own partition

Blog!

Contact us!

Partners

Our mission

We Ran an LLM on IBM i. No Linux. No Cloud. No GPU.

What we actually did

What kind of binary

First run

Why this matters for IBM i shops

PASE: the bridge you already have

The big-endian problem (and how we solved it)

From proof of concept to production

Stage 1: Direct inference (available now)

Stage 2: OpenAI-compatible API server (coming soon)

Stage 3: Business application integration (in development)

How to try it yourself

1. Get the llama.cpp binary for AIX

2. Download a big-endian model

3. Run inference

What's next

More from LibrePower

Try the LLM on your own partition

You might also like

Blog!

Contact us!

Partners

Our mission