Essential insights from Hacker News discussions

The provenance memory model for C

Here's a summary of the themes from the Hacker News discussion:

Unicode in C Identifiers and Source Code

A significant portion of the discussion revolves around the support for Unicode characters within C identifiers, as introduced in later C standards. Users debated the practical implementation of this feature and whether it's a good practice.

  • C Standard's Stance: The C standard, particularly C99 and C23, does allow for Unicode characters in identifiers, though with nuances regarding escape notation and specific character classes. "Quoting cppreference," a user explained: "An identifier is an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters specified using \u and \U escape notation(since C99), of class XID_Continue(since C23). A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character(since C99)(until C23) , or Unicode character of class XID_Start)(since C23))".
  • Compiler Dependence: The practical application of Unicode identifiers heavily depends on compiler support. "In practice depends on the compiler," noted one user. Another user provided examples: "If it were up to me, anything outside the basic character set in a source file would be a syntax error, I'm simply reporting what the spec says." and later, "If it were up to me, anything outside the basic character set in a source file would be a syntax error, I'm simply reporting what the spec says." They also pointed out that "If it were up to me, anything outside the basic character set in a source file would be a syntax error, I'm simply reporting what the spec says." was a direct quote to contradict the spec.
  • Readability and Intent: Some users found Unicode characters in code, especially in variable names, to be confusing or a "questionable choice." One user stated, "Definitely a questionable choice to throw off readers with unicode weirdness in the very first code example." However, others saw value in using Unicode for specific purposes: "I use unicode for math in comments, and think makes certain complicated formulas far more readable."
  • "Basic Character Set" Ambiguity: The definition of a "basic character set" was also a point of contention, with one user suggesting, "What a 'basic character set' is depends on locale." Another user expressed a strong opinion: "Anything except US-ASCII in source code outside comments and string constants should be a syntax error."

Website and Content Formatting Issues

A recurring theme was the poor formatting and rendering of the original Hacker News post, which made it difficult to read and understand.

  • Display Problems: Several users reported encountering display issues, including pages appearing as JSON or being difficult to parse. "I can't even view the post, I just get some kind of content management system-like with the page as JSON or something, in pink-on-white. I'm super confused. :|" expressed frustration.
  • Code Block Errors: Specifically, code blocks were not closed properly, leading to large sections of text being swallowed. "Looks like a code block didn't get closed properly, before this phrase: \n\n> the functions recip and recip⁺ and not equivalent \n\nSeveral paragraphs after this got swallowed by the code block."
  • Need for Proofreading: The presence of these errors led to a call for better post-processing and proofreading. "From the PVI section onward it seems to recover, but if the author sees this please fix and re-convert your post." and "Actually, there are more errors further in the text, this needed proper proofreading before it was posted, I can somewhat struggle through because I already know this topic but if this was intended to introduce newcomers it's probably very confusing."

The Enduring Appeal and Criticisms of C

The discussion touched upon C's persistent relevance while also highlighting its well-known drawbacks and the community's desire for safer, more modern alternatives.

  • C's Strengths: C's omnipresence and foundational role were acknowledged. "On the plus side, it's installed everywhere, and it's not indent sensitive" was a positive point made.
  • C's Criticisms: Traditional criticisms of C were voiced, including its syntax, pointer handling, and macros. One user compiled a list:
    CaSe Sensitivity
      Weird pointer syntax
      Lack of a separate assignment token
      Null terminated strings
      Macros - the evil scourge of the universe
  • Desire for Social Acceptance and Safety: Some users expressed a desire for C to be more "socially acceptable" for new projects, implying a need for increased safety and modern features. "I love Rust, but I miss C. If C can be updated to make it generally socially acceptable for new projects, I'd happily go back for some decent subset of things I do. However, there's a lot of anxiety and even angst around using C in production code."
  • Influence of Social Pressure: The role of social pressure in language adoption was also debated, with some arguing against it, while others acknowledged its impact. "Or better yet, don't let 'social pressure' influence your choice of programming language ;)" contrasted with "It’s hard. Programming is a social discipline, and the more people who work in a language, the more love it gets."
  • Alternatives and C's Evolution: Languages like Rust, Zig, Pascal, and Ada were mentioned as alternatives that offer more safety or modern features. Zig, in particular, was highlighted as potentially filling the niche for a safer C-like language. "Feels like Zig is starting to fill that role in some ways. Fewer sharp edges and a bit more safety than C, more modern approach, and even interops really well with C (even being possible to mix the two)." The potential for C itself to evolve and incorporate safer practices was also a theme.

Memory Safety and Pointer Provenance in Programming Languages

The concept of memory safety and the emerging idea of pointer provenance emerged as a forward-looking technical topic within the discussion.

  • Pointer Provenance: The abstract notion of pointer provenance was introduced as a way to enhance memory safety by tracking the origin and validity of pointers. "provenance model basically turns memory back into a typed value. finally malloc wont just be a dumb number generator, it'll act more like a capability issuer. and access is not 'is this address in range' anymore, but “does this pointer have valid provenance”. way more deterministic, decouples gcc -wall"
  • Tools and Techniques: Projects like LLVM's TySan (Type-Based Aliasing sanitizer) and Fil-C (a modified Clang for memory safety) were cited as efforts to improve memory safety in C and C++.
  • Purpose and Impact: There was some bewilderment about the practical outcomes of pointer provenance: "Will this create more nasal demons? I always disable strict aliasing, and it's not clear to me after reading the whole article whether provenance is about making sane code illegal, or making previously illegal sane code legal."