Essential insights from Hacker News discussions

Semantic Line Breaks (2017)

This Hacker News discussion revolves around the practice of "semantic line breaks" in text, where each sentence or logical clause starts on a new line in the source text for the benefit of the writer and editor, without necessarily affecting the final rendered output. The conversation touches upon its historical roots, practical implications, technical challenges, and the differing opinions on its utility.

Historical Precedent and Unix Philosophy

The discussion begins by establishing that the concept of placing each sentence on a new line has historical precedent, notably attributed to Brian Kernighan in 1974. This practice was also common in the documentation of early Unix systems.

"Start each sentence on a new line. Make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly. Since most people change documents by rewriting phrases and adding, deleting and rearranging sentences, these precautions simplify any editing you have to do later." — Brian Kernighan, 1974

"This is how all the Unix documents were written." - kps

Technical Challenges with Markup Languages and Rendering

A significant portion of the debate focuses on how various markup languages and their renderers handle line breaks, and the difficulties that arise when trying to implement semantic line breaks correctly. Specifically, the use of em dashes, which are often not surrounded by spaces in certain languages, presents a challenge when the markup language automatically converts line breaks into spaces.

"A semantic line break SHOULD occur after an […] em dash (—). I agree with this, however it means that no existing markup language supports semantic line breaks, because every last one of them just turns the break into a space—and em dashes are, in most locales, not to be surrounded by a space. Consequently, you’ll end up with a stray space if you do this." - chrismorgan

The issue of how to handle languages with no word separators and the varying implementations of CSS text property line-break are also discussed as complications.

"CSS presently just leaves such decisions UA-defined <https://drafts.csswg.org/css-text-4/#line-break-transform>: any remaining segment break is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level." - chrismorgan

The Utility of Lightweight Markup Languages (LMLs)

The conversation also highlights the idea of creating custom lightweight markup languages to address these specific issues, with one user sharing their experience of developing their own LML.

"More folks should define their own lightweight markup languages! It’s fun and makes your writing and notes feel more like your own." - photon_garden

The flexibility of Markdown, particularly with tools like Pandoc, is also mentioned as a way to allow for semantic breaks without affecting the output.

"Good thing about Markdown is that the lack of a proper spec means you can pick one you like (when possible). Pandoc for instance treats input Markdown line-breaks in a sane way, allowing semantic breaks to not affect the output." - 3036e4

Pros and Cons for Authors and Editors

Proponents of semantic line breaks emphasize the benefits for the author and editor, such as improved readability of the source text, easier identification of sentence boundaries, and reduced noise in version control diffs.

"Adding a line break after each sentence makes it easier to understand the shape and structure of the source text" - Original author, quoted by eviks

"I also wonder, why conceal bits of information from readers, while they could possibly benefit of them the same way editors and writers do. Admittedly, the outcome then seem like a poetry, but … why not?" - myfonj

"The main reason I use semantic line breaks, not explicitly mentioned in this article, is that it minimizes reformatting when editing. Only the subclause being edited is reformatted, while the rest of the paragraph remains as-is. This also minimizes the changes in line-oriented diffs." - layer8

"I’ve often thought this would be useful for version control and change review, since it allows diffs to be a lot less noisy. I’m imagining how much easier it would be to review a PR with significant README edits if the file was already structured with semantic line breaks." - jsdalton

"Edits show up as a -/+ on just the sentence or clause that has changed. Contrast with hard-wrapped text, where a single word change towards the beginning of a paragraph can cause the entire paragraph to be replaced in the diff view, as things reflow." - meatmanek

Counterarguments and Concerns

Critics, however, raise concerns about the impact on the readability of the raw source text for general readers or collaborators who might not adhere to the same conventions. They also point out potential conflicts with editors that automatically reflow text.

"The problem is that this makes having line breaks that are not paragraph breaks in the output much more awkward and I think those are much more important than line breaks that are only there in the source." - account42

"Nope again, visually you've just wasted my devices width or overestimated my smartphone's width and I get exactly the same issue you've just complained about: a single sentence that doesn't fit. Semantically, what you're looking for already exists and is called a paragraph. A sentence has a different meaning, which you break by line breaking after every single one. It kills the structure, not "makes it easier to understand the shape and structure of the source text"" - eviks

"There is a very good technical argument for NOT using "semantic" line breaks when editing markup source code, especially of the "hardwrap" variety, and that is the ability to easily diff two versions of the same document, e.g. when comparing latex git commits. Anything that reorganises the sentence around for the sake of maintaining justification, completely destroys any meaningful diff from taking place." - tpoacher

"If your editor auto reflows the text, that will conflict with this, by erasing line breaks you inserted. This is imposing an 80-character line length limit. With a line length limit, I want an editor to reflow my text so I don't have to do the line length limit manually." - Thorrez

The Role of Editors, Preprocessing, and Customization

The discussion touches upon solutions like specialized editors, preprocessing steps (e.g., using regex), and the judicious use of features like Zero Width Space (ZWSP) and <wbr> tags to manage line breaks. The existence of tools like sembr (a command-line tool for semantic linebreaks) is also mentioned.

"Unicode has U+200B ZERO WIDTH SPACE for that purpose. In HTML and hence Markdown you can also use <wbr>." - layer8

"If you’re using a custom setup anyway, you can have it be inserted automatically by regex replacement, as a pre-rendering step." - layer8

"I made a command-line tool [0] powered by Transformer models that performs semantic linebreaks to breaks lines in a text file at semantic boundaries. It supports multiple file types including LaTeX, Markdown, and plain text, with automatic file type detection." - admko

Personal Perspective and "Offended" Readers

A notable point of contention is whether the presence of these semantic line breaks in the raw source is inherently "bad" for any reader. Some argue that the practice is for the writer/editor's benefit and doesn't impact the final output, while others contend that raw markdown files are often read and edited by others, making the source's readability paramount. The idea that one's aversion to such formatting might be "psychological" is also humorously (or seriously) suggested.

"I think you might be misunderstanding. The semantic line breaks described here are not shown to readers. They are visible only to the person writing/editing the text, as a tool for their own use. If you aren't someone who finds a tool like this useful for your own writing, then no worries! Nobody has been harmed by this existing but not being used. It has no effect on the result." - dkh

"But in my experience most text that gets rendered is also read and edited by multiple people in its source form, so why wouldn't you want to make source just as easy to read?" - riffraff

"Your aversion appears to be psychological. It seems to me like you have trouble examining things by the sum of their parts and semantic line breaks agitate this." - tolerance