FreeBSD Full Native Port of ROCm: From Day One to Compiling the Runtime

Note: I rewrote part 1 and 2 into this one larger post using an llm since they were very poorly written at the time

Hello everyone! This is the first post of many to come. This term, I’m working at FreeBSD, and the project I've been assigned is a pretty tall order: port ROCm from Linux to FreeBSD. I'm not one to turn down a challenge like this, though—to be honest, it sounds like an absolute blast.

When I started, it took me about 2–3 days just to figure out how to navigate FreeBSD (and lowkey, I still need plenty of help), but I finally got it to successfully run kldload amdgpu! For context, my main workstation is my personal laptop (Ryzen 7 7840HS, Radeon 780M, RTX 4060, 32GB RAM) running Linux, which I use as a reference. My actual development and testing rig is a FreeBSD Foundation Framework Desktop powered by a Strix Halo CPU. I initially spent about two days trying to get display graphics working on it before decided to just drop it, go headless, and move on.

Pro tip for getting the drivers working: You need to pull firmware directly from the Linux firmware repository and use the absolute latest version of drm-kmod (don't grab it from ports), or it won't work. You might also need to recompile the kernel and world.

The Quest for the Super-Repo and the NUMA Problem

With the driver loading, it was time to move onto the important part: building the projects inside the ROCm super-repo. Looking through the project list, I spotted rocm-core and figured that was a logical place to start. I cloned the super-repo, ran CMake, ran make, and wrote a short test program to query the version. It worked on the very first try with zero issues. Incredible!

Naturally, my celebration was cut short. After a bit of Googling, I realized that rocr-runtime is where all the actual, critically important logic lives. When I tried to run that, it failed spectacularly. (To be fair, I would have been jumping for joy if it worked out of the box).

The runtime failed because it explicitly requires libdrm and libnuma. While libdrm on FreeBSD is fundamentally identical to its Linux counterpart, FreeBSD does not actually have a libnuma. Instead, non-uniform memory access is handled entirely by the operating system kernel, and it is handled very differently.

Linux vs. FreeBSD NUMA Architecture

Assuming you know what a NUMA memory region is and why it matters, here is the core issue: FreeBSD handles NUMA mapping via processes and threads. A process chooses to bind itself to a specific NUMA region, and consequently, whatever it touches, reads, or writes becomes part of that assigned region. Linux, on the other hand, operates via virtual memory address (VMA) mapping. It explicitly declares that memory address range X belongs to one domain, while address range Y belongs to another.

If you've done any sort of low-level development, you can immediately spot the architectural friction here. Furthermore, because FreeBSD exposes its NUMA policies via a kernel API accessed through sysctl, there is no userland library to link against.

Writing the Translation Layer

To bypass this, I wrote a custom libnuma translation layer. I wanted to avoid making massive structural changes to the upstream ROCm source code because upstreaming those changes later would be an absolute nightmare. You can check out the translation library here.

If you look through the code, you'll notice it is incredibly lightweight. That’s intentional. I had an LLM scan through the codebase to find every unique function defined by numactl that was imported by the ROCm runtime, and then I manually implemented them. Since my current development rig is a single-domain setup, and I didn't want to rewrite FreeBSD's core memory management policies, my shim layer just returns 0 for all bindings. It's a bit hardcoded, but it works perfectly for a single domain, and we can make it elegant later.

Building the Custom AMD Compiler Toolchain

Getting lost in the codebase sauce means I don't remember every single line I edited, so the rest of this breakdown is based directly on my scratchpad and git diffs. One massive realization I had is that you cannot do anything until you have AMD's custom LLVM compiler toolchain fully working. It does not compile out of the box on FreeBSD. It required several targeted patches, such as mapping CLOCK_MONOTONIC_FAST to CLOCK_MONOTONIC_RAW, and a weird link to the environ variable.

Honestly, I was shocked that my environment variable hack worked, but it correctly pulls them at runtime. You can grab my compiler branch here: llvm-project-rocm (Commit hash: 8450d34b327abfe9e2994a6476003f33221ad8d5).

This is the exact CMake command I used to build the toolchain:

cmake -S llvm -B build \
 -DLLVM_ENABLE_PROJECTS="clang;lld" \
 -DLLVM_EXTERNAL_PROJECTS="amd-device-libs;comgr" \
 -DLLVM_EXTERNAL_AMD_DEVICE_LIBS_SOURCE_DIR="/root/dev/llvm-project/amd/device-libs" \
 -DLLVM_EXTERNAL_COMGR_SOURCE_DIR="/root/dev/llvm-project/amd/comgr" \
 -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" \
 -DCMAKE_BUILD_TYPE=RelWithDebInfo \
 -DCMAKE_INSTALL_PREFIX=/usr/local/rocm-llvm

A bizarre build quirk: If you change the build type to Debug, the build fails at the linking stage. It appears the build system aggressively tries to optimize specific sections and strips out necessary debug info in the process. If you want to replicate this headache, just switch the build type to Debug and watch it break.

Compiling the rocr-runtime

Good news on the NUMA front: my shoddily coded shim layer required zero updates for this phase and it worked flawlessly. WEEEE!

With the compiler ready, I successfully got rocr-runtime to compile and install today! I was shocked. To test it, I threw together a quick script to see if it could detect the GPU. To absolutely no one's surprise, it didn't find any—in fact, it can't even communicate with the amdkfd driver yet.

HOWEVER: It gracefully hit the internal exception handler instead of hard-crashing the system! This is massive progress. It behaved exactly as expected. Here is the CMake configuration I used to get it built, using my custom branch at rocm-systems (Commit hash: 660429045eafe89636f9d17c08f8797037dedc19):

cmake -DNUMA_LIBRARIES=/usr/local/lib/libnuma.so -DNUMA_INCLUDE_DIR=/usr/local/include -DClang_DIR=/usr/local/rocm-llvm/lib/cmake/clang/ -DCMAKE_BUILD_TYPE=RelWithDeb ..

The Anatomy of a 1,542-Line Patch

Getting to this point required roughly 1,542 lines of code changes across the runtime. Let's break down the most notable files I had to wrangle:

hsakmt/freebsd/kfd_ioctl.h
This file required explicitly including <sys/mman.h> to resolve memory-mapping types. Shoving it directly into this header saved me from having to add it manually to over 10 different files that include it downstream.

This is also where I started some highly cursed, un-upstreamable workarounds. Linux uses a dynamic macro called _IOC_SIZESHIFT because ioctl sizes can shift depending on hardware; FreeBSD just hardcodes this value to 16. So, I added a blunt #define _IOC_SIZESHIFT 16.

Additionally, Linux uses MADV_DONTFORK to keep memory allocations from copying over during a process fork. FreeBSD doesn't have this; it uses minherit() instead. I wrote a static inlined wrapper function and a macro to intercept and substitute the call. I also defined MAP_NORESERVE to 0. On Linux, ignoring this causes strict allocation errors, but FreeBSD utilizes lazy memory allocation by default, so we can safely ignore it. Finally, I turned MADV_HUGEPAGE into a no-op macro since the man pages explicitly state it’s completely optional.

libhsakmt/src/svm.c
Wrapped #include <alloca.h> inside a Linux-specific preprocessor macro, since that header doesn't exist on FreeBSD. Easy fix.

libhsakmt/src/topology.c
Accidentally deleted #include <sys/sysinfo.h> (oops). I need to go back and wrap that properly in an #ifdef __linux__ block. I also refactored several lines to leverage standard POSIX/Unix calls instead of Linux-specific shortcuts.

hsa-runtime/CMakeLists.txt
The default build file was fundamentally broken. The runtime relies heavily on C++20 features, yet the build script defaulted to C++17. On top of that, its internal logic assumed that any generic Unix-like target was automatically Linux. I patched it to properly recognize FreeBSD and pull my custom source files.

hsa-runtime/core/inc/amd_hsa_loader.hpp & amd_hsa_loader.cpp
An absolute mountain of preprocessor switch code and increasingly cursed macros to resolve OS-specific loader paths. HAHAHAH.

amd_topology.cpp
Honestly, I don't entirely know what's happening under the hood here, but modifying this was vital to get the system to construct the core GPU device lists. The original array initialization style was breaking on FreeBSD's compiler, so I refactored how the lists are instantiated.

amd_core_dump.cpp
Beyond header changes, I had to define a macro substituting Linux's SYS_gettid system call with FreeBSD's native thr_self(). Lowkey, I hope this works as intended, but I'm not deeply familiar with the underlying threading API yet. I also had to make several type conversions much more explicit to satisfy the compiler's strict error settings.

amd_elf_image.cpp
A bit weird. I ended up forcing the image loader to drop back to an OS-agnostic fallback parsing method to get past formatting errors.

create_trap_handler.sh
I couldn't easily get Bash installed in my minimal dev environment, so I changed the shebang and modified the script to run under ksh.

freebsd/os_freebsd.cpp
I literally copied the entire original Linux implementation over, attempted a build to harvest all the compiler errors, and then fed them to an LLM for code solutions. I spent over an hour auditing its output to ensure it wouldn't immediately implode, but lowkey, I fully expect this file to break during runtime testing.

Closing Thoughts and Next Steps

I spent about 3 hours of deep, agonizing focus using GDB to trace through our test binary, stepping line-by-line through everything—including raw mutex allocations and atomic operations. The runtime loader is doing something incredibly bizarre: it actively attempts to load dxg (a core Windows Direct X component), and for some inexplicable reason on FreeBSD, the loader reports that it successfully loaded it?!

Because of this, the runtime is completely misidentifying the active driver paths. My primary goal for next Friday is to rip out this broken pathing logic, completely prevent it from trying to load Windows components, and point it toward the actual amdkfd driver so we can start making proper ioctl calls.

Providing a strict timeline right now is tough. I had to abruptly cut my tracing short today because a friend's apartment building unfortunately caught fire, and they needed a place to crash. This project currently fluctuates daily between "This is the perfect engineering challenge I didn't know I needed," and "This project is going to push my absolute sanity and understanding of computers to their breaking points."

Stay tuned! More patches (and plenty of ioctl suffering) will be published by next week. Let me know if you have any suggestions in the comments!