Open source, supposed to perform well, linux api and abi compatible? Interesting. Does it aheive that? DOE's FAST-OS? IBM's PERCS? Good call on predicting the GHz wall [24, 1, 20] I agree that maintenance is hard, but mostly becuase of drivers, backwards compatability [see Silas B-W paper about malicious drivers in linux] Yes! Make a clean break to 64-bit. YES! Incremental changes are boring. [19], sub-OSes are time-multiplexed, see Virtual Memory Managers [25, 16, 9, 22] LARGE-SCOPE OS research! So linux code can run on it, is it slower/faster? Is this even good? It will obviously make the OS more usable... Haha, designed to avoid Big Kernel Lock Aim for portability in the sense of problem domains, and embedded/server + cpu archs "Open up for experimentation parts of the system that are traditionally accessible only to experts"??? Make the system easier to understand? Or modular enough that you can focus on one aspect? Nice, built-in performance monitoring infrastructure [17, 10] -- poor locality means poor performance, poor scalability, discussed here. PPC (Protected Procedure Call) -- hand-off scheduling, handled on the same processor -- strategy makes sense. Malloc uses per-processor pools ((But what if tasks one one processor are more memory hungry than on another?)) A ``service'' is structured as a set of dynamically interconnected object instances that are lazily constructed... COOL Objects, like the process object, can be structured to optimize for the concurrent access patterns expected (e.g. if threads on many cores page fault and need to access the process object's data) User-mode scheduling, don't block stuff in the kernel, let page faults be handled by the process Queue up work that awaits int or IPC, rather than "blocking in the kernel" -- but isn't that just like blocking? External file servers! nice Cool, be able to page out non-critical (i.e user data) parts of the kernel Unix semantics = fork & Copy-On-Write???? Pluggable paging impls, like region-specific behaviour Keep NUMA in mind, and use larger pages (or any page size desired) per-process "file/memory" management seems flexible and portable. [34], [4, 5, 6] Cool, hot-updates to the OS Probes can be hot-swapped in and out. [27] quiessence, hot swapping linux drivers -- THE NATURAL EVOLUTION OF driver development paradigms -- it's called RCU?? Has user-level impl of kernel services -- therefore ukernel like? Though they claim to come from exokernel designs So threads are basically user-level only: kernel allocates memory and user schedules them. Therefore threads don't really take up resources, kind of like dynamic page allocation *** only the "dispatcher" takes up space, the user scheduler can do w/e else while a thread is blocked on a PPC Kernel scheduler runs on each proc, deciding which dispatcher to run *** ^^^ why were user-level threads bad before? Because the kernel couldn't multiplex them? But I think this gets around it? For faster PCCs with more data than can fit in registers, there is the PPC page, which is swapped on context switch. -- mo IPC context switch hacks -- and process can reduce context switch overhead by saying how much of the PPC page they are using. [33, 12] KFS [39] well-integrated performance monitoring having a graphical analysis tool of the perf logs is key! I like that they reuse linux's drivers and userland, and tc/ip stack! But it's on PowerPC only? [28, 40] modular system makes it more accessible to prototyping things like mallocs and schedulers K42 sounds like they have taken the exactly correct approach for a research project: open source code that takes advantage of existing open source drivers and userland through reuse of Linux code, aimed at 64-bit multiprocessor machines, is scalable to both small and large computers, and aims for revolutionary rather than incremental changes to the traditional OS design paradigms. I am also impressed that writers predicted the GHz wall of about 4GHz hit in 2006. I am further impressed that they can acheive native speed with unmodified Linux applications. Unfortunately the system currently only run on the PowerPC architecture, which explains why I haven't seen it in wider use. The design goals are similar to what I've seen in other papers: avoid global state and locks, portability, simplicity and modularity, and built-in performance monitoring. K42 acheives lots of its scalability by avoiding global state and locks. Scalability is further improved by maximizing the locality of all services and data structures in the kernel. System calls use a "Protected Procedure Call" model that performs scheduler hand-off and handles the syscall on the same processor as the caller. This also gives it easier portability to new hardware platforms due to a model that is disconnected from the hardware. The modularity of the system makes it easier for experimentation: users don't need to understand the whole system to be able to implement a new scheduler for example. Furthermore, every service provided by ther user-level libraries of the OS is pluggable and extendible. New memory allocation strategies can be dynamically enabled on a per-node, per-process, or even per-thread basis. For performance, on top of locality, the K42 kernel has swapable pages for less critical parts of the kernel such as user data. For dynamic plugins of kernel services and structures, they have implemented a hot update system for the OS. Not only does this mean that individual applications can have their own optimized kernel services, but that these services can be installed and uninstalled at runtime without rebooting the system. This technology was integrated into the Linux kernel to aid driver development *** in 2001, called Read-Copy Update (RCU). Threads in K42 are user-level and user scheduled, with the dispatcher that schedules user threads being schedulable by the kernel. I believe this solved the problem of tradeoffs between user threads that are easy to implement and use, and kernel threads that are schedulable on a fine-grained level by the kernel. If, for example, a user thread is blocked on the CPU that it's running, the user thread scheduler, which is still technically part of the kernel, can schedule other user threads accordingly. *** This further means that threads are allocated in a lazy kind of way, if a user program initially allocates a large thread pool, but only uses a single thread, there will be minimal overhead to the kernel and rest of the system. *** Again we see the IPC speed hacks in K42, except instead of only optimizing the case where the message can be fit in registers, K42 allocates a per-processor "PPC page" that can be used to transer up-to a page-sized amount of data between user and kernel for system calls. It is considered part of the context of the processor and is swapped accordingly. This page is also dynamically resizably in case the user program will not need it. Finally, the built-in performance monitoring was considered key by K42's developers in being able to pinpoint system and user performance problems. The monitoring system itself is scalable because it was built-in as a design principal from the start. It can be dynamically turned on and off with minimal overhead to the system. The developers even went as far as writing special analysis tools to read and visualize the logging output of their performance counters, another strategy that they emphasize as important for understanding their system. These counters also work with native Linux applications, meaning K42 is can be used to profile Linux applications and libraries.