Asynchronous Error Handling Is Hard

The Hacker News discussion revolves around various philosophies and practicalities of error handling in software development, with a particular focus on exceptions versus other methods like error codes or explicit return values. Here's a summary of the key themes:

The Utility and Eloquence of Exceptions

A significant portion of the discussion champions exceptions as a superior error-handling mechanism, primarily due to their ability to simplify "happy path" coding and provide crucial debugging information.

Simplifying the Happy Path: Developers expressed a preference for exceptions because they allow code to focus on the normal execution flow without being cluttered by explicit error checks at every step. Rorylaitila states, "I always have global error handler that logs and alerts of anything uncaught. This allows me to code the happy path." This sentiment is echoed by PaulHoule, who recounted an early experience with C: "It was painful seeing how 5 lines of real logic were intertwined with 45 lines of error handling logic that, in the end, did what exceptions did for free."
The Power of Stack Traces: The immediate availability of stack traces at the point of failure is highlighted as an invaluable debugging tool. Bob1029 asserts, "The entire reason exceptions are good is because of stack traces. It is amazing to me how many developers do not understand that having a stack trace at the exact instant of a bad thing is like having undetectable wall hacks in a competitive CS:GO match." Rorylaitila agrees, questioning the alternative: "Yes, I've never quite understood the 'But with exceptions it's hard to debug why the error occurred after the fact, its better to be explicit in advance' - The stack trace points exactly to the line. And usually, with the error message and context, its all I need."

Challenges and Criticisms of Exceptions

Despite their advantages, exceptions are not without their drawbacks, with several users pointing out specific issues that can arise in practice, particularly concerning libraries, asynchronous programming, and the potential for opaque error reporting.

Opaque Library Errors: A major concern is when libraries, especially those with closed source, obscure the root cause of errors by swallowing original exceptions or throwing generic ones. "Opaque errors from libraries are where this really sucks," admits Rorylaitila. "The worst is when they swallow the original error and throw a generic exception instead." Fliesand7 elaborates on this struggle: "This has been my biggest problem with exceptions, one, for the reason outlined above, plus it's for how much time you actually end up spending on figuring out what the exception for a certain situation is. 'Oh you're making a database insertion, what's the error that's thrown if you get a constraint violation, I might want to handle that'. And then it's all an adventure, because there's no way to know in advance."
Asynchronous Programming Complexities: The nature of asynchronous operations complicates exception handling, as the direct ownership of the call stack is blurred. Rorylaitila notes, "The I do there is wrapped the async code in its own global error handler, so to speak." PaulHoule also emphasizes the thought required: "In the async case you can pass the Exception as an object as opposed to throwing it but you're still left with the issue that the failure of one 'task' in an asynchronous program can cause the failure of a supertask which is comprised of other tasks and handling that involves some thinking." b0a04gl further clarifies this distinction: "in async code ,errors belong to the task ,not the caller. in sync code ,the caller owns the stack ,so it makes sense they own the error. but async splits that."
Propagating Errors Across Systems: While exceptions can work across systems with technologies like WCF/SOAP, the ability to reliably propagate and interpret these errors can be complex. Bob1029 mentions, "You can even forward the remote stack traces by turning on some scary flags in your app.config."

Alternatives and Nuances: Wrapping, Return Codes, and Checked Exceptions

The discussion also explores alternative error handling strategies and the nuances within different approaches, including explicit error wrapping, the use of get_last_error (or errno), and the concept of checked exceptions.

The Role of Wrapping and Metadata: For those who don't use exceptions, error wrapping serves a similar purpose but with a different trade-off. User 9rx explains, "If you aren't using exceptions you are using wrapping instead, and said wrapping is merely an alternative representation of what is ultimately the very same thing." They further explain the benefit: "The benefit of wrapping over exceptions... is that each layer of the stack gains additional metadata to provide context around the whole execution. The tradeoff is that you need code at each layer in the stack to assign the metadata instead of being able to prepare the data structure all in one place at the point of instantiation."
The Problem with Global State (get_last_error/errno): While some appreciate the explicitness of get_last_error-style error handling, the global or thread-local nature of these mechanisms is a significant drawback. O11c points out, "The problem with getlasterror and errno is that they're global (thread-local, whatever)." However, they also suggest a solution: "But if you make them take a context object, there's no longer a problem."
Checked Exceptions: A Divisive Concept: Checked exceptions, which force developers to explicitly handle or declare potential errors, evoke strong opinions. PaulKeeble offers a nuanced positive view: "from me is that I quite like declared exceptions. It makes the interface of a method clear in all the ways it can fail and you can directly choose what to handle often without having to look at the docs to work out what they mean, because the names tell you what you need to know." Conversely, PaulHoule cites Rust's struggles with async as a cautionary tale, implying that complex error handling can be difficult. Bigstrat2003, however, is more enthusiastic: "Yep, checked exceptions are the shit. You can of course abuse them to create a monstrosity (as you can with anything), but when used responsibly I think they are by far the best error handling paradigm." Yet, 9rx notes a common criticism: "Checked exceptions were introduced to try to help with that problem, giving you at least a compiler error if an implementation changed from underneath you. But that comes with its own set of problems and at this point most consider it to be a bad idea."

Structured Concurrency and Isolated Tasks in Asynchronous Programming

In the context of asynchronous code, the discussion touches on emerging patterns and languages that aim to better manage task ownership and error propagation, moving away from the more free-form approaches.

Task Ownership in Async: The idea that errors in asynchronous operations should be contained within the task itself, unless the caller has a specific reason to intervene, is a recurring theme. b0a04gl advocates: "write async blocks like isolated tasks. contain errors inside unless the caller has a real decision to make. global error handler picks up the rest." EGreg adds to this by suggesting that side effects should be deferred until all subtasks are confirmed to have succeeded, enabling rollbacks.
Structured Concurrency as a Solution: Structured concurrency, as implemented in libraries like Trio for Python, is presented as a way to address the task ownership problem. Quietbritishjim explains: "Structured concurrency... solves the issue of task (and exception) ownership. In languages / libraries that support it, when spawning a task you must specify some enclosing block that owns it." This approach aims to make error handling more predictable and manageable.
Elixir's "Let It Crash" Philosophy: Elixir, building on Erlang's principles, is praised for its approach to asynchronous error handling. Innocentoldguy states, "Asynchronous code used to be the source of many difficult bugs... but Elixir's (or, more accurately, Erlang's) 'let it crash' architecture helps eliminate many of these issues." This implies a model where errors are localized, and the system is designed to recover from failures gracefully.

By and large, the discussion showcases a nuanced appreciation for the trade-offs inherent in any error-handling strategy, with exceptions being a powerful, yet not universally applicable, tool. The conversation highlights the ongoing evolution of best practices, particularly in the face of increasingly complex programming paradigms like asynchronous operations.