Essential insights from Hacker News discussions

Io_uring, kTLS and Rust for zero syscall HTTPS server

This Hacker News discussion revolves around a technical article discussing a web server implementation likely leveraging io_uring and exploring advanced I/O and concurrency concepts, primarily within the context of Rust programming. Here's a summary of the key themes:

The Power and Complexity of io_uring

The core of the discussion centers on io_uring, a Linux kernel interface for asynchronous I/O. Many users express excitement and curiosity about its capabilities, seeing it as a significant advancement for high-performance networking.

  • "This is really cool. I've been thinking about something similar for a long time and I'm glad someone has finally done it." - sandeep-nambiar
  • "I'd like to see DPDK style full kernel bypass next" - ValtteriL
  • "Unfortunately io_uring is disabled by default on most cloud workload orchestrators, like CloudRun, GKE, EKS and even local Docker. Hope this will change soon, but until then it will remain very niche." - alde
  • "this is impressive but it’s also an amazing amount of complexity and difficult programming to work around the fact that syscalls are so slow." - api
  • "For anyone wanting to learn more about how to create a small server with io_uring: https://unixism.net/2020/04/io-uring-by-example-article-seri..." - klaussilveira
  • "So, current status on async Rust - you need to understand: Futures, Pin, Waker, async runtimes, Send/Sync bounds, async trait objects, etc." - npalli

Alternatives to Traditional Debugging Tools

Users discuss which tools are suitable for inspecting system behavior when strace might not be sufficient or when using more advanced kernel features like eBPF.

  • "Whats the goto instead of strace, if you wanted to see what was going on?" - boredatoms
  • "I think you have to use eBPF-based tools" - abrookewood
  • "perf and look at stack traces (or off-cpu events for waits/locks). also, ebpf" - fuy

Historical Web Server Architectures and Evolution

The discussion touches upon the evolution of web servers, contrasting older models like CGI and high-process-count Apache with modern event-driven architectures like Nginx. This provides context for why techniques like io_uring are enabling new performance levels.

  • "Your write up connected some early knowledge from when I was 11 where I was trying to set up a database/backend and was finding lots of cgi-bin online. I realize now those were spinning up new processes with each request" - bmcahren
  • "It wasn't just CGI, every HTTP session was commonly a fork"ed copy of the entire server in the CERN and Apache lineage! Apache gradually had better answers, but their API with common addons made it a bit difficult to transition so webservers like nginx took off which are built closer to the architecture in the article with event driven I/O from the beginning." - kev009
  • "If you're gonna do green threads you might as well throw in a GC too and get a whole runtime. And now you're writing Go." - const_cast

Rust's Async Model: Challenges and Trade-offs

A significant portion of the conversation is a detailed debate about Rust's asynchronous programming model, its interaction with io_uring, and the difficulties in achieving safe, zero-cost async I/O, particularly concerning buffer management and cancellation.

  • "The io-uring crate doesn’t help much with this. The API doesn’t allow the borrow checker to protect you at compile time, and I don’t see it doing any runtime checks either. I've seen comments like this before[1], and I get the impression that building a a safe async Rust library around io_uring is actually quite difficult. Which is sort of a bummer." - Seattle3503
  • "I think the right way to build a safe interface around io_uring would be to use ring-owned buffers, ask the ring for a buffer when you want one, and give the buffer back to the ring when initiating a write." - JoshTriplett
  • "This works perfectly well, and allows using the type system to handle safety. But it also really limits how you handle memory, and makes it impossible to do things like filling out parts of existing objects, so a lot of people are reluctant to take the plunge." - Tuna-Fish
  • "There is, I think, an ownership model that Rust's borrow checker very poorly supports, and for lack of a better name, I've called it hot potato ownership. The basic idea is that you have a buffer which you can give out as ownership in the expectation that the person you gave it to will (eventually) give it back to you. It's a sort of non-lexical borrowing problem, and I very quickly discovered when trying to implement it myself in purely safe Rust that the "giving the buffer back" is just really gnarly to write." - jcranmer
  • "The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP) and it has heavily influenced the async design (sometimes indirectly through other languages)." - newpavlov
  • "No, this is a mistaken retelling of history. The Rust developers were not ignorant of IOCP, nor were they zealous about any specific async model. They went looking for a model that fit with Rust's ethos, and completion didn't fit." - kibwen
  • "Rust is in a strange place because they're a systems language directly competing with C++. Async, in general, doesn't vibe with that but green threads definitely don't." - const_cast
  • "The problem does not exist in the stackfull model by the virtue of user being unable (in safe code) to drop stack of a stackfull task similarly to how you can not drop stack of a thread. If you want to cancel a stackfull task, you have to send a cancellation signal to it and wait for its completion (i.e. cancellation is fully cooperative)." - newpavlov
  • "Deal with it. Async is my greatest disappointment in the otherwise mostly stellar language. And I will continue to argue strongly against it." - newpavlov
  • "It doesn't seem bonkers to me. I know you already know these details, but spelling it out: If I'm using select/poll/epoll in C to do non-blocking reads of a socket, then yes I can use any old stack buffer to receive the bytes, because those are readiness APIs that only write through my pointer "now or never". But if I'm using IOCP/io_uring, I have to be careful not to use a stack buffer that doesn't outlive the whole IO loop, because those are completion APIs that write through my pointer "later"." - oconnor663
  • "The facts that Send/Sync bounds model are still relevant in all the other languages, the absence of Send/Sync just means it's easier to write subtly incorrect code." - K0nserv

Performance and Benchmarking

Several users are keen to see performance metrics and discuss the potential for significant speedups, referencing other performance-related articles.

  • "I really want to see the benchmarks on this ; tried it like 4 days ago and then built a standard epoll implementation ; I could not compete against nginx using uring but that's not the easiest task for an arrogant night so I really hope you get some deserved sweet numbers ; mine were a sad deception but I did not do most of your implementation - rather simply tried to "batch" calls." - 6r17
  • "I am patient to wait for the benchmarks so take your time ,but I honestly love how the author doesn't care about benchmarks right now and wanted to code to be clean first." - Imustaskforhelp
  • "For TCP streams syscall overhead isn't a big issue really, you can easily transfer large chunks of data in each write(). If you have TCP segmentation offload available you'll have no serious issues pushing 100gbit/s." - dpecked

Kernel Technologies and Low-Level Optimization

Beyond io_uring, users also discuss other kernel-level technologies that contribute to high performance, such as kTLS, DPDK, and AF_XDP.

  • "I really don’t understand this argument. If you force the user to transfer ownership of the buffer into the I/O subsystem, the system can make sure to transfer ownership of the buffer into the async runtime, not leaving it held within the cancellable future and the future returns that buffer which is given back when the completion is received from the kernel. What am I missing?" - vlovich123
  • "The facts that Send/Sync bounds model are still relevant in all the other languages, the absence of Send/Sync just means it's easier to write subtly incorrect code." - K0nserv
  • "On FreeBSD, its been in the kernel / openssl since 13, and has been one runtime toggle (sysctl kern.ipc.tls.enable=1) away from being enabled. And its enabled by default in the upcoming FreeBSD-15. We (at Netflix) have run all of our tls encrypted streaming over kTLS for most of a decade." - drewg123

Rust Development Experience (DevEx) and Compile Times

A brief but present theme is the developer experience in Rust, particularly concerning compile times, with some users finding them acceptable and others lamenting their slowness.

  • "I think rusts glacial compile times prevent it from being a useful platform for web apps. Yes it's a nice language, and very performant, but it's horrible devex to have to wait seconds for your server to recompile after a change." - LAC-Tech
  • "What a time to be alived that seconds to recompile is consider horrible devex. At my first job out of college it took 30 minutes to recompile and launch the server. Now the kids complain about 10 seconds." - maeln
  • "Compile times aren’t glacial and will be much faster with the new trait solver and cranelift." - j-krieger

Threading Models and Concurrency

The discussion touches on different threading models, including one-thread-per-core (TPC), green threads, and Java's virtual threads, comparing their suitability for high-performance systems.

  • "Rust: Well yes. Rust does force you to understand the things, or it won't compile. It does have drawbacks. Go: goroutines are not async. And you can't understand goroutines without understanding channels. And channels are weirdly implemented in Go, where the semantics of edge cases, while well defined, are like rolling a D20 die if you try to reason from first principles." - thomashabets2
  • "Isolating a core and then pinning a single thread is the way to go to get both low latency and high throughput, sacrificing efficiency." - gorset
  • "A mistake people make with thread-per-core (TPC) architecture is thinking you can pick and choose the parts you find convenient, when in reality it is much closer to "all or nothing"." - jandrewrogers