The Hacker News discussion revolves around a tool called "What The Fork," which visualizes process creation during builds, primarily for identifying performance bottlenecks. Several themes emerge from the user comments:
The Value of Visualization for Build System Performance
Many users express enthusiasm for the tool and its potential to reveal hidden performance issues in build systems. They highlight that poor or missing visualizations can lead to many problems being overlooked.
- "That's really cool. Fascinating to think about all the problems that get missed due to poor or missing visualizations like this," commented bgirard.
- dhooper, the OP, noted, "My call with the Mozilla engineer was cut short, so we didn't have time to go into detail about what he found, I want to look into it myself."
- bvisness, the engineer who initially used the tool, described the observed issues as "Lots of constant-time slowness at the beginning and end of the build," "Dubious parallelism, especially with unified builds," and generally "a soup of
make
calls with no particular rhyme or reason." - "I am stuck in an environment with CMake, GCC and Unix Make (no clang, no ninja) and getting detailed information about WHY the build is taking so long is nearly impossible," lamented Night_Thastus.
- forrestthewoods stated, "This is great. I was skeptical from the title but the implementation is very clever. This could be a super super useful tool for the industry."
- "I love the visualization, I think it's great information and will be very helpful to whoever uses it," said CyberDildonics.
Potential Applications Beyond Compilation
While the tool's primary focus is on understanding build systems, users suggest it could be valuable for a broader range of applications that involve significant process creation.
- supportengineer asked, "What limits your tool to compiler/build tools, can it be used for any arbitrary process?"
- dhooper responded, "Yeah it can be used for any type of program, but I haven't been able to think of anything besides compilation that creates enough processes to be interesting. I'm open to ideas!"
- DiddlyWinks offered, "Video encoding and 3-D rendering are a couple that come to mind; I'd think they'd launch quite a few."
- audiofish commented, "Really cool tool, but perhaps not for the original use-case. I often find myself trying to figure out what call tree a large Bash script creates, and this looks like it visualises it well." They also added, "This would have been really useful 6 months ago, when I was trying to figure out what on earth some Jetson tools actually did to build and flash an OS image."
Direct Comparison and Alternatives to Build Systems
The discussion frequently touches upon the inefficiencies of common build systems like CMake and Make, drawing comparisons to more performant alternatives like Ninja.
- bvisness characterized the analyzed build as "far cry from the ninja example the OP showed in his post."
- Night_Thastus mentioned being "stuck in an environment with CMake, GCC and Unix Make (no clang, no ninja)."
- tiddles noted with their "huge catkin cmake project that cmake is checking the existence of the same files hundreds of times too. Is there anything that can hook into fork() and provide a cached value after the first invocation?"
- lights0123 provided specific tips: "- switch to ninja to avoid that exact issue since CMake + Make spawns a subprocess for every directory (use the binary from PyPi for jobserver integration)."
- boris raised a point about Ninja's performance, quoting, "It also has 6 seconds of inactivity before starting any useful work. For comparison, ninja takes 0.4 seconds to start compiling the 2,468,083 line llvm project. Ninja is not a 100% fair comparison to other tools, because it benefits from some “baked in” build logic by the tool that created the ninja file, but I think it’s a reasonable “speed of light” performance benchmark for build systems." They further elaborated on the efficiency of build2 compared to CMake+Ninja.
- mgaunard suggested, "The real solution is to eliminate build systems where you have to define your own targets. Developers always get it wrong and do it badly."
- tom_ recommended, "If you use the Visual C++ compiler on Windows, vcperf is worth a look: https://github.com/microsoft/vcperf - comes with VS2022, or you can build from github. I've used it with projects generated by UBT and CMake."
Interest in Advanced Features and Platform Support
Users expressed interest in specific functionalities and asked about the availability of the tool on different operating systems.
- xuhu inquired, "Is there a tool that records the timestamp of each executed command during a build, and when you rebuild, it tells you how much time is left instead of "building obj 35 out of 1023"?"
- dhooper indicated the possibility: "What The Fork knows all the commands run, and every path they read/write, so I should be able to make it estimate build time just by looking at what files were touched."
- aanet asked, "Is there a version available for MacOS today?? I'd love to give it a whirl... For Rust, C++ / Swift and other stuff."
- dhooper confirmed, "I'll be sending out the a macOS version to another wave of beta users after I fix an outstanding issue..."
- Night_Thastus observed, "It looks like it doesn't have a public release for any OS yet, but has a way to enter for early access."
- brcmthrowaway questioned, "What about OSes that dont use fork()?" to which dhooper replied, "I use whatever the equivalent is on that OS."
- jeffbee suggested, "This seems like a good place to integrate a Bazel Build Event Protocol stream consumer."
Historical Context and Similar Tools
Some users shared their experience with similar tools developed previously, highlighting both the value and the inherent technical challenges.
- unddoch mentioned, "I wrote a little GCC plugin for compile time tracing/profiling, if that's something you're interested in: https://github.com/royjacobson/externis"
- entelechy referenced their past work: "We did something similar using strace/dtruss back in 2018 with https://buildinfer.loopperfect.com/ and were generating graphs (using eg. graphviz and perfetto.dev) and BUCK files on the back of that." They also outlined challenges: "- syscall logs can get huge - especially when saved to disk. Our strace logs would get over 100GB for some projects (llvm was around ~50GB)" and "- It's runtime analysis - you might need to repeat the analysis for each configuration."
Compiler Optimizations and Build Time Correlation
A side discussion emerged regarding the relationship between compiler effort, binary size, and compile time.
- phaedrus shared, "When I was trying to improve compile time for my game engine, I ended up using compiled size as a proxy measure. Although it is an imperfect correlation, the fact that compiled size is deterministic across build runs and even across builds on different machines makes it easier to work with than wall clock time."
- mlsu questioned, "Wait, this is not intuitive at all for me. If the compiler is working harder wouldn't that result in a more compact binary? Maybe I'm thinking too much from an embedded software POV. I suppose the compiler does eventually do IO but IO isn't really the constraint most of the time right?"
- staticfloat clarified, "So it's not a perfect proxy, but in general, if the output of your compiler is 2MB of code, it probably took longer to process all the input and spit out that 2MB than if the output of your compiler was 200KB."
- johannes1234321 added, "for most parts of the code I would assume there is a proportion from code size, via compile time to binary size."
Naming and Legitimac
A minor point of discussion was the tool's name, "What The Fork."
- CyberDildonics suggested, "I would think about a different name. Often names are either meant to be funny or just unique nonsense but something short and elegantly descriptive (like BuildViz etc.) can go a long way to making it seem more legitimate and being more widely used."
- dhooper responded, "Thanks CyberDildoNics!"
- hiccuphippo quipped, "Name checks out."
- metalliqaz pointed out, "Isn't
wtf
already a fairly common command?"