Essential insights from Hacker News discussions

I write type-safe generic data structures in C

The discussion revolves around achieving generic programming in C, exploring various techniques and their limitations. Here's a summary of the key themes:

The Core Problem: Lack of True Generics in C

A recurring theme is C's inherent lack of built-in support for generic programming, unlike languages like C++ or D. This forces developers to employ workarounds using macros, unions, and other C features to achieve similar results.

  • As dhooper (the author) states, the article is about "using a union to associate type information with a generic data type."
  • wordglyph expresses this sentiment succinctly by saying, "And that's why I like C++ templates."
  • sltkr lists other features found in other languages that C lacks, including "Or actual generics."
  • Kranar and adastra22 discuss why developers might stick with C despite this, with adastra22 mentioning "Embedded systems, for example" and mikepurvis bringing up "Any established C codebase, for example the kernel or Postgres?".

Macro-Based Generics and their Pitfalls

Many participants discuss using C's preprocessor to create generic implementations. This approach, while powerful, is noted for its complexity and potential for errors.

  • eqvinox mentions sticking to "macro soup probably" for intrusive data structures and wishes for a preprocessor directive like _Include to "help with doing this kind of really long macros, hiding the clunky '#define, #define, #define, #include' inside a macro…".
  • uecker points to a proposal for "multi-line macros" as a potential improvement.
  • zem offers a counter-opinion: "i am firmly of the opinion that compiling to c is a better route than doing clever c tricks to sort of get what you want." zem further elaborates, "The compiler can be pretty minimal and as you note it pays for itself."
  • WalterBright strongly criticizes macro usage: "Why suffer the C preprocessor? Using preprocessor macros is like using a hammer for finish carpentry, rather than a nail gun. A nail gun is 10x faster, drives the nail perfectly every time, and no half moon dents in your work." He shares an anecdote about debugging a complex macro-based MASM program, highlighting the difficulty of understanding undocumented macro-heavy code.

Type Safety and Runtime vs. Compile-Time Checks

Discussions touch on how to maintain type safety in generic C code, with a debate between runtime checks and compile-time guarantees.

  • uecker suggests that for intrusive data structures, "you often want them to erase the type so that you write generic algorithms as regular functions. In this case, it makes more sense to have a run-time check do to down casts."
  • eqvinox counters that for intrusive data structures, runtime checks aren't ideal, and expresses a preference for compile-time checks via "macro soup".
  • layer8 raises concerns about the type safety of function casting: "The casting of the function type assumes that the item pointer type (e.g. Foo) has the same representation as void, which the C standard doesn’t guarantee (in standardese: the two types aren’t “compatible”). Calling the function with the converted type therefore constitutes undefined behavior."
  • dhooper (author) clarifies that "casting is not the core of the type safety. Read the whole article."
  • b0a04gl questions how the type tagging system handles cases where "two types have same size and alignment but different semantics: like int vs float or struct { int a; } vs int? does the type tag system catch accidental reuse."

Intrusive vs. Non-Intrusive Data Structures

A significant aspect of the discussion involves how to implement generics for "intrusive" data structures (where the data structure node is embedded within the data object) versus "non-intrusive" ones (where the data object is stored within the data structure).

  • eqvinox asks uecker about doing this for "intrusive data structures, embedding the node struct in the data".
  • uecker suggests putting a "dummy member into the embedded node" but notes that for generic algorithms, runtime checks might be necessary unless you "erase the type."
  • eqvinox reiterates that embedding doesn't help find the node "mid-struct" and that they "don't immediately see how to do that with unions & typeof."
  • mfuzzey points to the Linux kernel's approach of embedding struct list_head within data structures as an example of "embed the list in the data" rather than the data in the list.
  • variadix notes that "Intrusive data structures are more convenient in C still, but working with them in a debugger is a pain."

C23 Features and Compiler Support

The new features in the C23 standard are mentioned as potential solutions to some of these C generic programming challenges.

  • HexDecOctBin highlights C23's ability to solve type compatibility issues: "C23 solves that too: [link to n3037.pdf]". They note support in GCC and Clang but not MSVC.
  • dhooper confirms this in the article, mentioning MSVC 19.39 as the first to support it, with a Godbolt link to demonstrate its absence in earlier versions.
  • gpderetta comments that "since C23 foo() is now a nullary function," indicating ongoing evolution of the C standard.
  • tedunangst and s3graham engage in a brief debate about which C standard to reference for foo()'s argument list interpretation.

Other Considerations and Alternatives

The discussion also touches upon other aspects of generic programming in C and alternative approaches.

  • asplake mentions working on a "generator of ML-style tagged variants and associated functions in C" and wonders about compatibility.
  • o11c provides detailed critiques of the author's code regarding alignment, type safety via assignment, and the inability to return values from operations.
  • rectang shares their experience building a C extension for a project, emphasizing the boilerplate generated to circumvent C's limitations and why many resort to void* casts.
  • zem advocates for generating C code from a higher-level language as a more robust approach than using C preprocessor macros.
  • monkeyelite suggests avoiding generic data structures altogether and focusing on specialized implementations, noting that "The #1 data structure in any program is array."
  • hgs3 is curious about implementing a generic hashmap, highlighting the challenges of performing operations like hashing and comparison on generic keys.
  • nixpulvis finds the naming convention in the Linux kernel's list initialization confusing, and mfuzzey explains the rationale behind INIT_LIST_HEAD vs. LIST_HEAD_INIT.
  • cherryteastain asks a provocative question: "Why would you jump through all these hoops instead of just writing C++ if you want 'C with generics'?" leading to the responses from Kranar and adastra22 about C's continued relevance in certain domains.
  • layer8 also points out that requiring a C++ toolchain for downstream users of an extension is a significant hurdle.
  • uecker states, "For me, C++ itself is the maze of hoops I would rather want to avoid."
  • D language is presented as an alternative by WalterBright, who argues its ListNode(T) struct is superior to C preprocessor macros for metaprogramming.