Note: I rewrote part 1 and 2 into this one larger post using an llm since they were very poorly written at the time
Hello everyone! This is the first post of many to come. This term, I’m working at FreeBSD, and the project I've been assigned is a pretty tall order: port ROCm from Linux to FreeBSD. I'm not one to turn down a challenge like this, though—to be honest, it sounds like an absolute blast.
When I started, it took me about 2–3 days just to figure out how to navigate FreeBSD (and lowkey, I still need plenty of help), but I finally got it to successfully run kldload amdgpu! For context, my main workstation is my personal laptop (Ryzen 7 7840HS, Radeon 780M, RTX 4060, 32GB RAM) running Linux, which I use as a reference. My actual development and testing rig is a FreeBSD Foundation Framework Desktop powered by a Strix Halo CPU. I initially spent about two days trying to get display graphics working on it before decided to just drop it, go headless, and move on.
Pro tip for getting the drivers working: You need to pull firmware directly from the Linux firmware repository and use the absolute latest version of drm-kmod (don't grab it from ports), or it won't work. You might also need to recompile the kernel and world.
With the driver loading, it was time to move onto the important part: building the projects inside the ROCm super-repo. Looking through the project list, I spotted rocm-core and figured that was a logical place to start. I cloned the super-repo, ran CMake, ran make, and wrote a short test program to query the version. It worked on the very first try with zero issues. Incredible!
Naturally, my celebration was cut short. After a bit of Googling, I realized that rocr-runtime is where all the actual, critically important logic lives. When I tried to run that, it failed spectacularly. (To be fair, I would have been jumping for joy if it worked out of the box).
The runtime failed because it explicitly requires libdrm and libnuma. While libdrm on FreeBSD is fundamentally identical to its Linux counterpart, FreeBSD does not actually have a libnuma. Instead, non-uniform memory access is handled entirely by the operating system kernel, and it is handled very differently.
Assuming you know what a NUMA memory region is and why it matters, here is the core issue: FreeBSD handles NUMA mapping via processes and threads. A process chooses to bind itself to a specific NUMA region, and consequently, whatever it touches, reads, or writes becomes part of that assigned region. Linux, on the other hand, operates via virtual memory address (VMA) mapping. It explicitly declares that memory address range X belongs to one domain, while address range Y belongs to another.
If you've done any sort of low-level development, you can immediately spot the architectural friction here. Furthermore, because FreeBSD exposes its NUMA policies via a kernel API accessed through sysctl, there is no userland library to link against.
To bypass this, I wrote a custom libnuma translation layer. I wanted to avoid making massive structural changes to the upstream ROCm source code because upstreaming those changes later would be an absolute nightmare. You can check out the translation library here.
If you look through the code, you'll notice it is incredibly lightweight. That’s intentional. I had an LLM scan through the codebase to find every unique function defined by numactl that was imported by the ROCm runtime, and then I manually implemented them. Since my current development rig is a single-domain setup, and I didn't want to rewrite FreeBSD's core memory management policies, my shim layer just returns 0 for all bindings. It's a bit hardcoded, but it works perfectly for a single domain, and we can make it elegant later.
Getting lost in the codebase sauce means I don't remember every single line I edited, so the rest of this breakdown is based directly on my scratchpad and git diffs. One massive realization I had is that you cannot do anything until you have AMD's custom LLVM compiler toolchain fully working. It does not compile out of the box on FreeBSD. It required several targeted patches, such as mapping CLOCK_MONOTONIC_FAST to CLOCK_MONOTONIC_RAW, and a weird link to the environ variable.
Honestly, I was shocked that my environment variable hack worked, but it correctly pulls them at runtime. You can grab my compiler branch here: llvm-project-rocm (Commit hash: 8450d34b327abfe9e2994a6476003f33221ad8d5).
This is the exact CMake command I used to build the toolchain:
cmake -S llvm -B build \ -DLLVM_ENABLE_PROJECTS="clang;lld" \ -DLLVM_EXTERNAL_PROJECTS="amd-device-libs;comgr" \ -DLLVM_EXTERNAL_AMD_DEVICE_LIBS_SOURCE_DIR="/root/dev/llvm-project/amd/device-libs" \ -DLLVM_EXTERNAL_COMGR_SOURCE_DIR="/root/dev/llvm-project/amd/comgr" \ -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DCMAKE_INSTALL_PREFIX=/usr/local/rocm-llvm
A bizarre build quirk: If you change the build type to Debug, the build fails at the linking stage. It appears the build system aggressively tries to optimize specific sections and strips out necessary debug info in the process. If you want to replicate this headache, just switch the build type to Debug and watch it break.
Good news on the NUMA front: my shoddily coded shim layer required zero updates for this phase and it worked flawlessly. WEEEE!
With the compiler ready, I successfully got rocr-runtime to compile and install today! I was shocked. To test it, I threw together a quick script to see if it could detect the GPU. To absolutely no one's surprise, it didn't find any—in fact, it can't even communicate with the amdkfd driver yet.
HOWEVER: It gracefully hit the internal exception handler instead of hard-crashing the system! This is massive progress. It behaved exactly as expected. Here is the CMake configuration I used to get it built, using my custom branch at rocm-systems (Commit hash: 660429045eafe89636f9d17c08f8797037dedc19):
cmake -DNUMA_LIBRARIES=/usr/local/lib/libnuma.so -DNUMA_INCLUDE_DIR=/usr/local/include -DClang_DIR=/usr/local/rocm-llvm/lib/cmake/clang/ -DCMAKE_BUILD_TYPE=RelWithDeb ..
Getting to this point required roughly 1,542 lines of code changes across the runtime. Let's break down the most notable files I had to wrangle:
hsakmt/freebsd/kfd_ioctl.h<sys/mman.h> to resolve memory-mapping types. Shoving it directly into this header saved me from having to add it manually to over 10 different files that include it downstream.
_IOC_SIZESHIFT because ioctl sizes can shift depending on hardware; FreeBSD just hardcodes this value to 16. So, I added a blunt #define _IOC_SIZESHIFT 16.
MADV_DONTFORK to keep memory allocations from copying over during a process fork. FreeBSD doesn't have this; it uses minherit() instead. I wrote a static inlined wrapper function and a macro to intercept and substitute the call. I also defined MAP_NORESERVE to 0. On Linux, ignoring this causes strict allocation errors, but FreeBSD utilizes lazy memory allocation by default, so we can safely ignore it. Finally, I turned MADV_HUGEPAGE into a no-op macro since the man pages explicitly state it’s completely optional.
libhsakmt/src/svm.c#include <alloca.h> inside a Linux-specific preprocessor macro, since that header doesn't exist on FreeBSD. Easy fix.
libhsakmt/src/topology.c#include <sys/sysinfo.h> (oops). I need to go back and wrap that properly in an #ifdef __linux__ block. I also refactored several lines to leverage standard POSIX/Unix calls instead of Linux-specific shortcuts.
hsa-runtime/CMakeLists.txthsa-runtime/core/inc/amd_hsa_loader.hpp & amd_hsa_loader.cppamd_topology.cppamd_core_dump.cppSYS_gettid system call with FreeBSD's native thr_self(). Lowkey, I hope this works as intended, but I'm not deeply familiar with the underlying threading API yet. I also had to make several type conversions much more explicit to satisfy the compiler's strict error settings.
amd_elf_image.cppcreate_trap_handler.shksh.
freebsd/os_freebsd.cppI spent about 3 hours of deep, agonizing focus using GDB to trace through our test binary, stepping line-by-line through everything—including raw mutex allocations and atomic operations. The runtime loader is doing something incredibly bizarre: it actively attempts to load dxg (a core Windows Direct X component), and for some inexplicable reason on FreeBSD, the loader reports that it successfully loaded it?!
Because of this, the runtime is completely misidentifying the active driver paths. My primary goal for next Friday is to rip out this broken pathing logic, completely prevent it from trying to load Windows components, and point it toward the actual amdkfd driver so we can start making proper ioctl calls.
Providing a strict timeline right now is tough. I had to abruptly cut my tracing short today because a friend's apartment building unfortunately caught fire, and they needed a place to crash. This project currently fluctuates daily between "This is the perfect engineering challenge I didn't know I needed," and "This project is going to push my absolute sanity and understanding of computers to their breaking points."
Stay tuned! More patches (and plenty of ioctl suffering) will be published by next week. Let me know if you have any suggestions in the comments!