FreeBSD Full Native Port of ROCm: From Day One to Compiling the Runtime

Note: I rewrote part 1 and 2 into this one larger post using an llm since they were very poorly written at the time

Hello everyone! This is the first post of many to come. This term, I’m working at FreeBSD, and the project I've been assigned is a pretty tall order: port ROCm from Linux to FreeBSD. I'm not one to turn down a challenge like this, though—to be honest, it sounds like an absolute blast.

When I started, it took me about 2–3 days just to figure out how to navigate FreeBSD (and lowkey, I still need plenty of help), but I finally got it to successfully run kldload amdgpu! For context, my main workstation is my personal laptop (Ryzen 7 7840HS, Radeon 780M, RTX 4060, 32GB RAM) running Linux, which I use as a reference. My actual development and testing rig is a FreeBSD Foundation Framework Desktop powered by a Strix Halo CPU. I initially spent about two days trying to get display graphics working on it before decided to just drop it, go headless, and move on.

Pro tip for getting the drivers working: You need to pull firmware directly from the Linux firmware repository and use the absolute latest version of drm-kmod (don't grab it from ports), or it won't work. You might also need to recompile the kernel and world.


The Quest for the Super-Repo and the NUMA Problem

With the driver loading, it was time to move onto the important part: building the projects inside the ROCm super-repo. Looking through the project list, I spotted rocm-core and figured that was a logical place to start. I cloned the super-repo, ran CMake, ran make, and wrote a short test program to query the version. It worked on the very first try with zero issues. Incredible!

Naturally, my celebration was cut short. After a bit of Googling, I realized that rocr-runtime is where all the actual, critically important logic lives. When I tried to run that, it failed spectacularly. (To be fair, I would have been jumping for joy if it worked out of the box).

The runtime failed because it explicitly requires libdrm and libnuma. While libdrm on FreeBSD is fundamentally identical to its Linux counterpart, FreeBSD does not actually have a libnuma. Instead, non-uniform memory access is handled entirely by the operating system kernel, and it is handled very differently.

Linux vs. FreeBSD NUMA Architecture

Assuming you know what a NUMA memory region is and why it matters, here is the core issue: FreeBSD handles NUMA mapping via processes and threads. A process chooses to bind itself to a specific NUMA region, and consequently, whatever it touches, reads, or writes becomes part of that assigned region. Linux, on the other hand, operates via virtual memory address (VMA) mapping. It explicitly declares that memory address range X belongs to one domain, while address range Y belongs to another.

If you've done any sort of low-level development, you can immediately spot the architectural friction here. Furthermore, because FreeBSD exposes its NUMA policies via a kernel API accessed through sysctl, there is no userland library to link against.

Writing the Translation Layer

To bypass this, I wrote a custom libnuma translation layer. I wanted to avoid making massive structural changes to the upstream ROCm source code because upstreaming those changes later would be an absolute nightmare. You can check out the translation library here.

If you look through the code, you'll notice it is incredibly lightweight. That’s intentional. I had an LLM scan through the codebase to find every unique function defined by numactl that was imported by the ROCm runtime, and then I manually implemented them. Since my current development rig is a single-domain setup, and I didn't want to rewrite FreeBSD's core memory management policies, my shim layer just returns 0 for all bindings. It's a bit hardcoded, but it works perfectly for a single domain, and we can make it elegant later.


Building the Custom AMD Compiler Toolchain

Getting lost in the codebase sauce means I don't remember every single line I edited, so the rest of this breakdown is based directly on my scratchpad and git diffs. One massive realization I had is that you cannot do anything until you have AMD's custom LLVM compiler toolchain fully working. It does not compile out of the box on FreeBSD. It required several targeted patches, such as mapping CLOCK_MONOTONIC_FAST to CLOCK_MONOTONIC_RAW, and a weird link to the environ variable.

Honestly, I was shocked that my environment variable hack worked, but it correctly pulls them at runtime. You can grab my compiler branch here: llvm-project-rocm (Commit hash: 8450d34b327abfe9e2994a6476003f33221ad8d5).

This is the exact CMake command I used to build the toolchain:

cmake -S llvm -B build \
 -DLLVM_ENABLE_PROJECTS="clang;lld" \
 -DLLVM_EXTERNAL_PROJECTS="amd-device-libs;comgr" \
 -DLLVM_EXTERNAL_AMD_DEVICE_LIBS_SOURCE_DIR="/root/dev/llvm-project/amd/device-libs" \
 -DLLVM_EXTERNAL_COMGR_SOURCE_DIR="/root/dev/llvm-project/amd/comgr" \
 -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" \
 -DCMAKE_BUILD_TYPE=RelWithDebInfo \
 -DCMAKE_INSTALL_PREFIX=/usr/local/rocm-llvm
A bizarre build quirk: If you change the build type to Debug, the build fails at the linking stage. It appears the build system aggressively tries to optimize specific sections and strips out necessary debug info in the process. If you want to replicate this headache, just switch the build type to Debug and watch it break.

Compiling the rocr-runtime

Good news on the NUMA front: my shoddily coded shim layer required zero updates for this phase and it worked flawlessly. WEEEE!

With the compiler ready, I successfully got rocr-runtime to compile and install today! I was shocked. To test it, I threw together a quick script to see if it could detect the GPU. To absolutely no one's surprise, it didn't find any—in fact, it can't even communicate with the amdkfd driver yet.

HOWEVER: It gracefully hit the internal exception handler instead of hard-crashing the system! This is massive progress. It behaved exactly as expected. Here is the CMake configuration I used to get it built, using my custom branch at rocm-systems (Commit hash: 660429045eafe89636f9d17c08f8797037dedc19):

cmake -DNUMA_LIBRARIES=/usr/local/lib/libnuma.so -DNUMA_INCLUDE_DIR=/usr/local/include -DClang_DIR=/usr/local/rocm-llvm/lib/cmake/clang/ -DCMAKE_BUILD_TYPE=RelWithDeb ..

The Anatomy of a 1,542-Line Patch

Getting to this point required roughly 1,542 lines of code changes across the runtime. Let's break down the most notable files I had to wrangle:


Closing Thoughts and Next Steps

I spent about 3 hours of deep, agonizing focus using GDB to trace through our test binary, stepping line-by-line through everything—including raw mutex allocations and atomic operations. The runtime loader is doing something incredibly bizarre: it actively attempts to load dxg (a core Windows Direct X component), and for some inexplicable reason on FreeBSD, the loader reports that it successfully loaded it?!

Because of this, the runtime is completely misidentifying the active driver paths. My primary goal for next Friday is to rip out this broken pathing logic, completely prevent it from trying to load Windows components, and point it toward the actual amdkfd driver so we can start making proper ioctl calls.

Providing a strict timeline right now is tough. I had to abruptly cut my tracing short today because a friend's apartment building unfortunately caught fire, and they needed a place to crash. This project currently fluctuates daily between "This is the perfect engineering challenge I didn't know I needed," and "This project is going to push my absolute sanity and understanding of computers to their breaking points."

Stay tuned! More patches (and plenty of ioctl suffering) will be published by next week. Let me know if you have any suggestions in the comments!