Special Edition: BSDCan Side Quests, Tomfoolery, and Shenanigans (2026/06/22)

I went to BSDCan last week(end), and yeah, I was not able to do much technical stuff, but I was able to learn quite a bit, like about how FreeBSD's virtual memory sub-system works, the fact that drm-kmod does work on a Strix Halo, and can be inferenced on. Along with suggestions for NUMA stuff. I should also state that I've tried to write the past 2 blog posts with a bit of narrative structure, so it's easier/fun to follow, but because my progress has been scattered and confusing, I decided to just write out a bunch of disjointed stories here instead.

The guy who got a burger with 8 patties, no cheese, and got 6 more hotdogs, and finished it.

So I booked my return trip to start in Kanata... Which is not Ottawa, so I had to take a series of trains/busses to get to the station. By the time I got to the station, I was hungry, and had 1~ hr to spare, so I walked to a nearby Harvey's, and ordered some food. Now if you've ever been to a Harvey's, you'll know that they make the burger in front of you. Now the guy before me had ordered 8 extra beef patties for his burger, and another 4 hot dogs, along with an extra-large poutine. He said it was his lunch, and wanted to treat himself—was very interesting.

HMM Mark Johnston Suggestions

HMM stuff, how the memory sub-system works

During the conference's closing reception I had the chance to talk to Mark Johnston about some stuff that I'm a bit unclear about: NUMA and HMM support.

Before I start talking abt what he told me, I should probably briefly talk about HMM (Heterogeneous Memory Management) since it's a seldom-known topic, and its implications are not well understood. So, imagine you're running a high-end restaurant kitchen. For years you only had one main chef (the CPU). To do their job, the chef relied entirely on the main pantry (the System RAM). It was simple; the chef needed ingredients, they went to the pantry, the dinner was served.

But today, the kitchens are wayyy more complex, and there are now super-specialized chefs. Like a super-fast vegetable chopper (the GPU), a super-fast and efficient drink mixer (???? AI Accelerator, NPU, Co-Processor, who knows, insert whatever you want here). But there's a small problem: all of these super special people/chefs all have their own pantry/fridge/spice rack. If the main chef (the CPU) always has to hand-deliver ingredients from the main pantry to the GPU's mini-fridge, everything slows down.

This is where Heterogeneous Memory Management comes in. It's the ultimate kitchen coordinator—it makes all these different storage areas look and act like one giant seamless pantry. (Lowkey I'm dependent on HMM as a GPU programmer.)

The core problem that HMM solves is that CPUs, GPUs, NPUs, DSPs, etc., all lived in different worlds, and spoke different languages regarding memories. If a program wanted a GPU to process some data, the CPU had to explicitly copy that data out of the system RAM, send it over a slow hardware pipeline (PCIe), and paste it into the GPU's memory. And once the processing was done, gotta do it in reverse.

Enter HMM: Shared Virtual Memory (My GOAT). HMM bridges this gap by creating a shared virtual memory space. Instead of forcing the CPU and GPU to manage their own separate yards, the HMM tricks them into thinking they are looking at the exact same blueprint of memory.

Under the hood they use a few things: Pointer sharing, Demand Paging, and Hardware Coherency stuff (a bit foggy on this). Because the CPU and GPU share a virtual map, a programmer can use a single pointer that both the CPU and GPU understand, meaning that the GPU/CPU knows where to look without any data being copied over. Next there's Demand Paging (aka Page Migration), where data does not move until it has to. If the GPU needs a chunk of data that is currently sitting in the system RAM, a "Page Fault" occurs. HMM catches this fault, and transparently migrates those pages of data over to the GPU's fast local memory in the background. The programmer doesn't know :D. Finally, there's a level of hardware coherency needed, cause what happens when the CPU and GPU try to edit the same bit of data at the same time? HMM handles synchronization.

This matters because LLMs and 3D graphics often require moving terabytes of data back and forth between processors. HMM makes this easier. Also makes developers happier, and yeah!

Now I did have a quick convo with Mark during the mixer, and I told him something along the lines of "Previously Linux didn't have a method to deal with this kinda stuff, so each driver did their own thing, now it seems like Linux is making a more unified abstraction for this, but I have no idea how to start + [me talking abt irrelevant things thinking they were relevant]". If I'm not mistaken HMM has already been merged in. So I was wondering, does FreeBSD have an HMM abstraction, and if not, where would I need to look? He said that FreeBSD has a much more 'mature' memory management setup, where I can just create a new struct and create the corresponding pointers, and then should be good to go. Still a lot of work, but generally a bit better than Linux. I.e., FreeBSD's memory management system is a lot more object-oriented, and because of that modular design, it should be a lot easier.

On a related note about the ramblings I thought were useful, but I found out to be making a fool of myself: I do not need to care about NUMA. It literally only matters for performance, and I do not care about performance in a Proof of Concept.

Sashimi

There was a store we went to which had an interesting mix of cuisines; they also had sashimi. Now I haven't had sashimi in a while, and someone else at the table wanted to get it, saying that "They'd only get it if someone else got it", so I said I'd get it with them, and we got it. And it was so worth it—the Sashimi was definitely on the better side of what I've had. It was slices of a proper thickness (not shavings or a block), and had a much better texture than what I had before. I think the restaurant was called Side Door.

Would Not Recommend FlixBus

I was on the overnight busses, which were like 8~12 hours, and it was not a fun experience. You get all the downsides of taking the plane, and the downsides of taking a bus. Only upside is cost, but it comes at the cost of your sanity—can't sleep, cramped, oftentimes loud, hard to concentrate, just overall not a fun time. Would not recommend if at all possible. On an unrelated note, I was talking to the taxi cab driver after I got back to Waterloo, and he said he just got smth to go to Montreal and pick them up lol

The terrible lightning talk I gave

Now if you were at the conference, you probably went to the lightning talks. Now I decided to give one the day before, so I had about 12~ hours to make a presentation and submit it (yes, the deadline to submit was 9am, but I was gonna fall asleep and forget to submit, so I just submitted mine by midnight). Now we also gotta consider: on Thursday I had a special relativity and classical mechanics midterm, followed by needing to teleport to the bus stop a few hours later. Then, not being able to sleep on the 8~ hour bus ride to Ottawa, had to hold convos, understand presentations, ask questions, and then, make the presentation. I was very sleep-deprived and ended up bombing the presentation—talked too fast, yeah...... Presentation was on ROCm on FreeBSD progress and needing some help. I did catch a few ppl tho; someone told me that if you used the Vulkan backend, you could get a lot more performance (both on Linux and FreeBSD) on Strix Halo. Went pretty badly, you can find it online if you want to see how badly it went.

Omg I actually had to read ASM to debug an issue, it's been so long, but so refreshing

So on a related note, after I got amdgpu loading, I looked into why it was crashing. Took a few minutes, but tracked down the function—it was smth along the lines of loading the PSP. Now I read somewhere that when the IP_VERSION macro gets called for Strix Halo, it should return like 14.0.5 (this might've been a mistake on my end). So I commented out the code which was not being used for 14.0.5. Now keep that in mind, look at the two following images.

Now if u look at the register I printed out, that's the one being compared against in the switch statement. If you decode it, it's 14.0.1, meaning that the Strix Halo is identifying as 14.0.1. I commented out the function which initializes the function pointers, so when it dereferences them, it