Essential insights from Hacker News discussions

Fil's Unbelievable Garbage Collector

This Hacker News discussion centers on the Fil-C project, a tool that aims to make C programs memory-safe. The conversation touches on several key themes:

The Promise of Memory Safety in C

A significant portion of the discussion revolves around the core goal of Fil-C: providing memory safety for existing C code. Users express excitement and curiosity about this ambitious undertaking.

  • "Oh this is so cool," exclaims reactordev.
  • kerkeslager notes, "I am glad someone is doing something with that [referencing a paper on memory safety] in a non-academic (maybe?) context."
  • jandrewrogers finds it "really cool that someone is going hard at this part of the design space from an engineering geek standpoint."
  • crawshaw emphasizes the project's value as an "existence proof": "This is the sort of technique that is very effective for real programs, but that developers are convinced does not work. Existence proofs cut through long circular arguments."

Performance Trade-offs

A major point of debate is the performance overhead introduced by Fil-C. Users are concerned about how much slower programs will run compared to their native C counterparts.

  • johncolanduoni asks, "What do the benchmarks look like? My main concern with this approach would be that the performance envelope would eliminate it for the use-cases where C/C++ are still popular."
  • pizlonator, the project author, offers a range: "Some programs run as fast as normally. That's admittedly not super common, but it happens. Some programs have a ~4x slowdown. That's also not super common, but it happens. Most programs are somewhere in the middle."
  • Sesse__ provides specific, and in some cases much higher, slowdown figures: "FWIW, I just tested it on a random program I wrote recently, and it went from 2.085 seconds with Clang+jemalloc to 18.465 seconds with Fil-C. (No errors were reported, thank goodness!) So that's a 9x new worst case for you :-)" and "And on the next one, a SIMD-heavy searcher thingie... it went from 7.723 to 379.56 seconds, a whopping 49x slowdown."
  • foldr reports an even more significant slowdown for a specific use case: "I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build." pizlonator acknowledges this: "30x? Oof... I know that I regressed quickjs recently when I fixed handling of unions. It’s a fixable issue..."

The "Myth" of C/C++ Performance Sensitivity

The project author, pizlonator, challenges the common perception that C/C++ is chosen solely for performance, arguing that most C/C++ code isn't performance-sensitive in a way users would notice.

  • pizlonator claims, "This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because: That's what it was originally written in and nobody bothered to write a better version in any other language... The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language... C/C++ provides the best level of abstraction (memory and syscalls) for the use case."
  • He elaborates, "When I say 99% of the C code you use, I mean “use” as a human using a computer, not “use” as a dependency in your project. I’m not here to tell you that your C or C++ project should be compiled with Fil-C. I am here to tell you that most of the C/C++ programs you use as an end user could be compiled with Fil-C and you wouldn’t experience an degraded experience if that happened."
  • mike_hearn agrees: "You're absolutely right about all of this. People under-estimate how much code gets written in these languages just because decades ago they were chosen as the default language of the project and people are resistant to going full polyglot."
  • However, others disagree, pointing to specific domains:
    • julieeee counters: "Since performance is largely correlated to battery life, of course I would notice. An Nx reduction in battery life would certainly be a degraded experience."
    • johncolanduoni argues, "While there are certainly other reasons C/C++ get used in new projects, I think 99% not being performance or footprint sensitive is way overstating it. There's tons of embedded use cases where a GC is not going to fly just from a code size perspective, let alone latency... if Chrome gets 2x slower I'll finally switch back to Firefox. That's tens of millions of lines of performance-sensitive C++ right there."
    • saagarjha states, "Yes, people will absolutely notice. There's plenty of interactions that take 500ms that will now take a second."
    • spacechild1 adds, "DAWs and audio plugins... video editors. Audio plugins in particular need to run as fast as possible because they share the tiny time budget of a few milliseconds with dozens or even hundreds of other plugins instances."

Technical Choices and Alternatives

The discussion delves into specific technical decisions made by Fil-C and compares them to alternative approaches.

  • Garbage Collection (GC): Fil-C uses a concurrent GC.

    • pcfwik questions the choice of a full GC over "lock-and-key style temporal checking," suggesting the latter would be more predictable and avoid GC overhead.
    • pizlonator defends the GC by highlighting its thread-safety: "the capability model is totally thread safe and doesn’t require fancy atomics or locking in common cases."
    • kragen raises concerns about the GC's latency guarantees: "'Concurrent' doesn't usually mean 'bounded in worst-case execution time', especially on a uniprocessor. Does it in this case?" pizlonator clarifies: "Meh. I was in the real time GC game for a while... nobody agrees on what it really means to bound the worst case... Games: I bound worst case execution time, just assuming a fair enough OS scheduler, even on uniprocessor. Audio: I bound worst case execution time if you have multiple cores. Flight: I don't bound worst case execution time."
    • Regarding performance, pizlonator estimates GC overhead: "I would estimate 2x. Fil-C has additional overheads not related to GC, so maybe it’s higher."
  • Undefined Behavior (UB) and LLVM Optimizations: The project's approach to handling C's notorious undefined behavior, particularly concerning pointer arithmetic, is discussed.

    • pcfwik asks, "Do you disable such optimizations inside LLVM, or does Fil-C avoid this entirely by breaking pointers into pointer base + integer offset?"
    • pizlonator explains their strategy: "llvm is a lot less willing to exploit that UB... I run a highly curated opt pipeline before instrumentation happens. FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations... I made some surgical changes to clang CodeGen and some llvm passes to fix some obvious issues from UB." This is summarized as the "GIMSO property (garbage in, memory safety out)."
  • Pointer Capabilities and Provenance: The concept of "pointer capabilities" and how Fil-C handles pointer arithmetic and "provenance" is a technical highlight.

    • pizlonator explains that when pointers are converted to integers and stored, "they lose their capability. So accesses to them will trap and the GC doesn’t need to care about them."
    • charleslmunger shares an example of an idiom used in Protocol Buffers that relies on pointer provenance, and pizlonator confirms, "Yeah that should just work. Code that strives to preserve provenance works in Fil-C."
  • Comparison to Other Safety Mechanisms: Fil-C is compared to other security and sandboxing technologies.

    • Asked about RLBox, pizlonator differentiates: "RLBox is a containerization technology. Fil-C is a memory safety technology... Fil-C does memory safety without containerizing while RLBox does containerization without memory safety."

Use Cases and Applicability

The conversation touches on what types of software Fil-C is best suited for and its limitations.

  • Embedded Systems: The practicality of GC in resource-constrained embedded environments is debated.

    • kragen notes, "If you're worried about the code size of a GC you probably don't have a filesystem."
    • pizlonator acknowledges, "Yeah totally, if you're in those kinds of environments, then I agree that a GC is a bad choice of tech." They also mention the possibility of an "InvisiCaps" approach for embedded systems that avoids GC.
  • Cross-platform Support: The current lack of 32-bit support is noted: "Fil-C doesn't yet support 32-bit systems," kragen points out.

  • The Role of C/C++: The fundamental reasons for C/C++'s continued use are explored beyond just performance.

    • pizlonator reiterated, "It code that would be zero fun to try to write in anything other than C or C++" due to complex syscall usage.
    • conradev adds, "code size, toolchain availability and the universality of the C ABI are more good reasons for why code is written in C besides runtime performance."

Project Goals and Future

The author's approach to development and future aspirations are also mentioned.

  • pizlonator is praised for their "honest, no-nonsense, not full of marketing-speak reply."
  • When asked about future performance goals, pizlonator states, "I think that worst case 2x, average case 1.5x is attainable. - Code that uses SIMD or that is mostly dealing with primitive data in large arrays will get to close to 1x."
  • The potential for Fil-C to be used with "proof carrying code" for enhanced security is discussed, though with skepticism about practical adoption.

Reimplementing Core Software in Other Languages

A side discussion emerges about the trend of rewriting foundational C projects (like Python, SQLite, Bash) in languages like Rust, and how this compares to using Fil-C. Some users provide examples of these reimplementations, while others express skepticism about their maturity or the effectiveness of Rust's unsafe blocks.