Essential insights from Hacker News discussions

Formatting code should be unnecessary

Here's a summary of the themes expressed in the Hacker News discussion:

The Unix Philosophy and Plain Text as a Common Denominator

A central theme is the enduring strength of the Unix philosophy, which prioritizes composability and the use of plain text for data interchange. This approach is seen as a powerful "lowest common denominator" that ensures broad compatibility.

  • "Anything but text makes grep, diff, sed, and version control less effective. You end up locked into specialized tools, formats, or IDE extensions, while the Unix philosophy thrives on composability with plain text." - kelseyfrog
  • "This seems like one of the many cases where unix won out by being a lowest common denominator. Every platform can handle plain text." - lmm
  • "If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk." - accelbred
  • "The lowest common denominator rather is binary blobs. :-)" - aleph_minus_one
  • "The conversion of which to text and back has historically proven rather fraught." - thfuran
  • "Perhaps this is rather a design mistake in how UNIX handles things and is so focused on text." - aleph_minus_one
  • "All your examples work better for code with structural knowledge: [...] hide noisy syntax only changes, attempt to capture moved code." - gr__or

The Case for Automated Code Formatting (e.g., gofmt)

A strong counter-argument is made for the benefits of automated code formatters, which enforce a consistent style across a project, eliminating "bikeshedding" and disputes over formatting preferences.

  • "Gofmt's style is no one's favorite, yet gofmt is everyone's favorite is solid. Pick a not-unreasonable standard, enforce it, and move on to more important things." - scubbo
  • "Arguing and obsessing about code formatting is simply useless bikeshedding." - jsharpe
  • "Go was designed at Google with a built in style checker to explicitly address this and prevent bikeshedding." - __loam
  • "The goal of having every developer viewing the code with their own preferences just isn't that important." - jsharpe
  • "This is so future 'git diff's when adding new parameters don't look bad" - thewisenerd

The Debate Between Textual and Abstract Syntax Tree (AST) Representations

Much of the discussion revolves around the trade-offs between storing source code as plain text versus using a more abstract representation like an Abstract Syntax Tree (AST). The proponents of ASTs argue for better structural understanding, while text-based proponents highlight the accessibility and interoperability of traditional tools.

  • "If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk." - accelbred
  • "Nobody wants to have to run their own formatter rules in reverse in their head just to know what to grep for. That defeats the point of formatting at all." - sublinear
  • "The things all being described are way beyond non trivial to solve, and they'd need to be solved for every language. Grep works great." - komali2
  • "We still use grep because its useful. And it's useful precisely because it doesn't depend on syntax so will work on anything text based." - account42
  • "You'd need all-news tools for non-text world as well. [...] At least in the first case, you still have fall back." - theamk
  • "The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension." - crq-yml
  • "Even that is not without its cost. Most of these tools are written in different languages, which all have to maintain their own parsers, which have to keep up with language changes." - gr__or
  • "The complexity of a parser is orders of magnitude higher than that of an AST schema." - gr__or
  • "The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain." - theamk
  • "What the actual argument is, that if we could change the representation of source code, but still allow it to be text, then we would still be able to use grep, sed, diff etc." - kesor
  • "I would be curious on is tracing from errors back to the source code. Nearly every language I’ve used prints line number and offset on the line for the error. How that worked in the Diana world would be interesting to learn." - cowsandmilk
  • "This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to build an alternative track and make switching both gradual and worthwhile." - toaster

The Role of Typography and Readability in Code Formatting

Some users emphasize the typographical and aesthetic aspects of code formatting, arguing that it can enhance meaning and structure, and that automated formatters sometimes sacrifice this for consistency.

  • "There’s also a typography element to formatting source code. The notion that all code formatting is mere personal preference isn’t true. Formatting code a certain way can help to communicate meaning and structure. This is lost when the minimal tokens are serialized and re-constituted using an automated tool." - davetron5000
  • "I want to be able to quickly skim function names and then read arguments only if deemed relevant. I don’t want to read every single word." - forrestthewoods
  • "These feel like pretty trivial routines that can be encompassed by code formatting. We can contrive more extreme examples, like the for loop, but super custom formatting ("typesetting") like that has always made me feel awkward, feels like it givesicemse for people to use all manners of arbitrary formatting." - jauntywundrkind
  • "I find this arrangement much more readable than the short-line equivalent:" (demonstrating tabular alignment for function arguments) - elevation
  • "I’ve often wished that formatters had some threshold for similarity between adjacent lines. If some X% of the characters on the line match the character right above, then it might be tabular and it could do something to maintain the tabular layout." - a_e_k
  • "I still prefer 80. I won’t (publicly) scoff at 100 though. IMO 120 is reasonable for HTML and Java, but that’s about it." - skinner927
  • "For C++ headers I absolutely despise verbose doxygen bullshit commented a spreading relatively straightforward functions across 10 lines of comments and args." - forrestthewoods

The Tab vs. Spaces Debate and Its Significance

While a recurring topic, the discussion largely dismisses the tab vs. spaces debate as a minor point, especially when contrasted with the broader implications of formatting controls.

  • "There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?" - kelseyfrog
  • "Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough" - eviks
  • "Yes because if you want to deindent with tabs it is just delete one character whilst spaces requires you top delete x characters where x is the number of spaces you indent by." - pasc1878
  • "The tab width setting is for readability. Indent levels are fixed. The problem comes when you need to align things programmatically." - eviks

The Practicality and Evolution of Tools

Users discussed the practicality of proposed solutions, the limitations of current tools (like Git), and the desire for more intelligent, data-aware tooling.

  • "It’s a really good idea and I can’t believe people here are complaining that it’s bad because they can’t use grep! But that’s a good thing!! Who the hell is grepping code as if code had no structure and that’s the best you can do?" - brabel
  • "My personal preference is to use a formatter like Black for Python, and then to use blackened to ensure that the code is always indented according to my preferences." - jsharpe
  • "Smudge and clean filters work on text, git would not need to change at all. You would still store text, and still check out text, just transformed text. You could still check in anything you want, including partial code, syntax errors, or any other arbitrary text. Diffs would work the same way they do now." - hendrikto
  • "Most of these tools are written in different languages, which all have to maintain their own parsers, which have to keep up with language changes." - gr__or
  • "I can only recommend difftastic[1], which is a language aware diff. Independent of linter that shows the logical diff, not an assortment of characters or lines that changed." - peanball
  • "What I’d really like to see is a viable projectional editor and a broader shift from text-centric to data-centric tools." - toaster
  • "Git can use arbitrary merge (and diff) tools. Something like https://mergiraf.org/introduction.html works with git and gets you ast aware merging. Do not underestimate gits flexibility." - zokier
  • "You'd have to run diff and sed before the formatter which is harder for everyone." - sublinear

The "Lowest Common Denominator" vs. "Technical Debt" Argument

There's a tension between embracing the simple, widely compatible "lowest common denominator" of plain text and the technical debt incurred by not using more sophisticated, structure-aware representations.

  • "The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension." - crq-yml
  • "This is it, unfortunately git is "too dumb" for this. In order to merge code, it would have to either understand the AST. What happens when you stage the line } else return {? git doesn't allow to stage specific AST nodes. It would also mean that you can't stage partial code (that produces syntax errors)" - bapak
  • "The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain." - theamk
  • "It's a really good idea and I can’t believe people here are complaining that it’s bad because they can’t use grep! But that’s a good thing!! Who the hell is grepping code as if code had no structure and that’s the best you can do?" - brabel