Essential insights from Hacker News discussions

Without the futex, it's futile

This Hacker News discussion revolves around the futex (Fast Userspace Mutex) system call, its design, purpose, and comparisons with other concurrency primitives and operating system features.

Futex: Core Concept and Purpose

The fundamental idea behind futexes is to provide a bridge between userspace and the kernel for efficient synchronization primitives. The primary goal is to avoid system calls for uncontended lock operations.

  • "It's not that deep. The futex was developed just to save you from issuing a special system call to ask the OS to put you on a wait queue." - afr0ck
  • "The whole point is that implementing a mutex requires doing things that only the privileged OS kernel can do (e.g. efficiently blocking/unblocking processes)." - afr0ck
  • "You actually issue the futex system call to get yourself on the wait queue tied to the memory address. It separates out the waiting from the locking." - viega

Futex vs. Traditional Approaches & Polling

Futexes are seen as a significant improvement over older methods, especially those involving busy-waiting or inefficient blocking calls.

  • "And that can absolutely save a bunch of system calls, especially vs. polling mixed with sleep() or similar." - viega
  • "Futex was added as an ad hoc solution to the obvious needs of SMP processes communicating via atomic memory operations who still wanted blocking IPC." - ajross

Futex and Synchronization Atomicity

A key technical point of discussion is how futexes maintain atomicity between checking the lock state and initiating a wait, especially in the face of race conditions.

  • "It does not, in fact the two are fundamentally inseparable and the state of the memory address must be treated atomically with the waiting state. The magic of futex is that you can use a hardware atomic operation (c.f. lock cmpxchg on x86) to get the lock in the common/uncontended case, but if you have to wait you need to tell the kernel both that you need to wait and the address on which you're waiting, so it can use the same hardware interlocks along with its own state locking to put you to sleep race-free." - ajross
  • "It quite does; the kernel is not the keeper of the lock, it only needs to detect the race condition that would result in a spurious sleep. It cares not one bit about the rest of your semantics." - viega

The "Handle-less" Nature of Futexes

A notable aspect of futexes is their lack of explicit handles, relying instead on memory addresses and kernel observation.

  • "I think the coolest part of the futex is that it's a handle-less concept. There's no allocation or deallocation via syscall, just a kernel-based memory watcher that turns out to be incredibly useful as a primitive." - mmastrac
  • "Everything goes cleanly away when there are no more waiters, and the kernel never even sees a mutex where there's no contention." - mmastrac

Kernel-Managed Queues and "Turnstiles"

The underlying kernel mechanisms for managing waiting threads are discussed, drawing parallels to concepts like "turnstiles" in other operating systems.

  • "At the same time you also don't want to call into the kernel's internal malloc() whenever a thread ends up blocking on a lock to allocate the data structures that are needed to keep track of queues of blocked threads for a given lock." - EdSchouten
  • "To prevent that, many operating systems allocate these 'queue objects' whenever threads are created and will attach a pointer to it from the thread object. Whenever a thread then stumbles upon a contended lock, it will effectively 'donate' this queue object to that lock, meaning that every lock having one or more waiters will have a linked list of 'queue objects' attached to it." - EdSchouten
  • "Solaris. There they called these 'queue objects' turnstiles. The BSDs adopted the same approach, and kept the same name." - EdSchouten

The Article's Premise and the "Straw Man" Argument

A portion of the discussion critiques the article's framing, with one user suggesting it sets up a "straw man" argument to promote a discussion about futexes. The article's author defends their motivation.

  • "The author of this article has set up a straw man to facilitate the writing and marketing of an otherwise moderately interesting article on futexes." - Normal_gaussian
  • "Personally I find the approach taken by this article more than a little distasteful - presenting from a point of exaggerated conflict is both tiresome and likely to confuse." - Normal_gaussian
  • "I wrote the article; it was motivated by reading the book, which to my estimation is not well aimed for either academics or practitioners." - viega
  • "The fact is, I was disappointed enough with the book, that I put aside another post I was writing for it." - viega

User-Space vs. Kernel-Space Design Choices

The broader theme of moving functionality from kernel space to user space for performance or other reasons is touched upon, with futexes being an example of kernel support for user-space synchronization.

  • "Is it just me or moving things out of kernel space improves performance in general? Like context switching, mutex or entire TCP stack." - nromiun
  • "Mentioning epoll doesn't actually do any IO though, so it doesn’t help with syscall latency. It just avoids the overhead of doing IO via a large number of threads (memory overhead, hard context switches, etc.)." - johncolanduoni
  • "You're 100% right that there are plenty of other considerations, often positive for lifting things out, like minimization of ring 0 attack surface." - viega

Complexity and Standardization in Concurrency Primitives

The discussion touches on the inherent complexity of synchronization primitives and the trade-offs involved in standardization, using C++ standards as an example.

  • "This is such a frustrating stance that most standards have, honestly. 'Well, obviously we can't expect the OS/language implementers to be able to reliably implement feature X ― let's just leave it to the application programmer to deal with; they are, after all, are expected to have great skill sets and could easily work around it'." - Joker_vD
  • "If you overspecify, you close the door to better implementations. This is why, for example, C++ standard hash tables and regexes are an order of magnitude slower than third party ones." - ori_b
  • "Every constraint that you specify potentially locks out a better implementation." - ori_b

Evolution of Futexes (Futex2 and Related Features)

Recent developments, particularly futex2 and its integration with io_uring, are highlighted as ongoing improvements.

  • "Windows gained a WaitForMultipleObjects, which Linux 5.16 (end 2021) aped with a new Futex2." - jauntywundrkind
  • "NUMA support (finally landing!), Io_uring support in 6.7 (2024) (with a nice write up on it speeding up postgresql aio)" - jauntywundrkind

Futex vs. Benaphore

A user inquires about the relationship and potential advantages of futexes over an older primitive, Benaphore.

  • "I still haven’t seen a good comparison between Futex and Benaphore. Benaphores I understand, it predates Futexes by almost a decade, but what do Futexes add to the equation since hardly anyone talks about Benaphores (or is it a case of not invented here)?" - smallstepforman