Float Exposed

This Hacker News discussion touches upon two primary themes: the existence and purpose of new top-level domains (TLDs) like ".exposed" and ".sucks", and the intricacies and practical implications of floating-point number representation in computing.

The Rationale and Market for New gTLDs

A significant portion of the discussion revolves around the emergence of new generic Top-Level Domains (gTLDs), sparked by a question about the purpose of the ".exposed" TLD. Users point to established reasons for their creation, including providing new avenues for online identity, self-expression, and information sharing, as articulated by the registry operator:

"THE .EXPOSED TLD This TLD is attractive and useful to end-users as it better facilitates search, self-expression, information sharing and the provision of legitimate goods and services. Along with the other TLDs in the Donuts family, this TLD will provide Internet users with opportunities for online identities and expression that do not currently exist. In doing so, the TLD will introduce significant consumer choice and competition to the Internet namespace – the very purpose of ICANN’s new TLD program."

The motivations behind such TLDs are often linked to market demand and the creation of niche online spaces, even if those spaces are for criticism or specific types of content:

"For the same reason there's a .sucks TLD. There's a market for it."

Examples are given to illustrate how domain registration prices vary across different TLDs for the same or similar names, suggesting a speculative market at play:

"So windows.sucks and linux.sucks are available and 2000 USD/year, emacs.sucks is 200 USD/year and vi.sucks is already registered (but no website unfortunately)! On the other hand linux.rocks and windows.rocks are taken (no website), vi.rocks is 200 USD/year and emacs.rocks is just 14 USD/year."

The existence of services that buy up potentially valuable or controversial domains to prevent their use is also mentioned, indicating a strategic element in domain registration:

"Pretty sure there's a domain monitoring service for similarly or something along these lines that buys up domains like these to prevent usage."

The Perils and Precision of Floating-Point Arithmetic

A substantial part of the conversation is dedicated to the complexities and potential pitfalls of using floating-point numbers in computational tasks, particularly in areas like game development and scientific computing. Many users share insights and examples highlighting where floating-point precision issues can lead to unexpected behavior and significant errors.

Precision Loss with Distance from Origin

A core concern is the loss of precision when numbers become very large or very small, or when computations are performed far from the origin (0,0,0). This is illustrated with examples from game development:

"This came during my OMSCS Game AI course as an example of the dangers of using floats to represent game object location. If you get further from the origin point (or another game element you are referencing from) you are losing precision as the float needs to use more of the significand to store the larger value."

This problem can necessitate strategies like sector-based coordinate systems to manage precision:

"Define boundary conditions -- how much precision do you need? Then you can compute the min/max distances. If the 'world' needs to be larger, then prepare to divide it into sectors, and have separate global/local coordinates (e.g. No Man's Sky works this way)."

The infamous "Far Lands" in Minecraft are cited as a well-known example of these effects in real-world applications:

"I love how that became part of the 'mythology' of Minecraft as the "Far Lands", where travelling far from the world origin made terrain generation and physics break down, subtly at first, and less so as you kept going."

The Importance of Accurate Summation and Historical Errors

The way floating-point numbers are added significantly impacts accuracy. A more elaborate summation method, such as summing pairs recursively, yields more precise results than a straightforward sequential addition:

"If you have a large set of lets say floats in the range between 0 and 1 and you add them up, there is the straightforward way to do it and there is a way to pair them all up, add the pairs, and repeat that process until you have the final result. You will see that this more elaborate scheme actually gets a way more accurate result vs. simply adding them up."

Historical incidents, like the Patriot missile system failure due to float inaccuracies in time accounting, serve as stark warnings about the real-world consequences:

"I remember Patriot missile systems requiring a restart because they did time accounting with floats and one part of the software didn't handle the corrections for it properly, resulting in the missiles going more and more off-target the longer the system was running."

The Structure and Mechanics of Floating-Point Representation

Users delve into the technical details of how floating-point numbers are structured, explaining concepts like the mantissa and exponent and their roles in determining precision and range. The subdivision of a number range is described intuitively:

"Because the mantissa (like everything else) is encoded in binary, the first explicit (because there's implicit 1. at the beginning) digit of it means either 0/2 or 1/2 (just like in decimal the first digit after the dot means either 0/10 or 1/10 or 2/10...), the next digit is (0/2² = 0/4) or 1/4, third digit is 0/8 or 1/8 etc. You can visualize this by starting at the beginning of the "window", and then you divide the window into 2 halves: now the first digit of the mantissa tells you if you stay at the beginning of the first half, or move to the beginning of the 2nd half."

The relationship between the exponent and mantissa bits is explained as controlling the "window size" and the "subdivision" of that window.

Floating-Point Comparisons and Ordering

An interesting point raised is that floating-point numbers can often be compared as if they were signed integers, with certain caveats:

"My favorite FP Fun Fact is that float comparisons can (almost) use integer comparisons. To determine if a > b, reinterpret a and b as signed ints and just compare those like any old ints. It (almost) works!"

However, this simplification is not universally true, particularly with negative numbers due to differences in encoding schemes (sign-magnitude vs. two's complement):

"This isn't accurate. It's true for positive numbers, and when comparing a positive to a negative, but false for comparisons between negative numbers. Standard floating point uses sign-magnitude representation, while signed integers these days use 2s-complement."

Posits, an alternative number representation, are mentioned as having an advantage in this regard, sorting like integers across all values.

The Debate: Floats vs. Fixed-Point Arithmetic

A lively debate emerges regarding the continued reliance on floating-point arithmetic versus the use of fixed-point representations. Some argue that floating-point is fundamentally flawed and "sloppy," advocating for fixed-point for its determinism and predictable behavior:

"Real programmers don't use floating points, only sloppy lazy ones do. Real programmers use fixed point representation and make sure the bounds don't overflow/underflow unexpectedly."

This perspective suggests that the historical reliance on floating-point hardware has led computing down the "wrong path," hindering progress in areas like reproducible research and secure distributed systems due to non-determinism.

"The absolute GPU cluster-fuck with as many floating types as you can write on a napkin while drunk at the bar, mean that at the end of the day your neural network is non-deterministic, and you can't replicate any result from your program from 6 month ago, or last library version. Your simulations results therefore are perishable."

Conversely, others defend floating-point, highlighting its practical advantages in handling vast dynamic ranges and its suitability for graphics and simulations where gradual precision loss is often preferable to the catastrophic failures of fixed-point overflow:

"If you work with FPGAs, then converting a known to be working algorithm to fixed point is one of the most time consuming things you can do and it eats DSP slices like crazy. Every square or square root operation will cause a shiver to run down your spine because of the lack of dynamic range."

"For games and most simulation, the 'soft failure' of gradual precision loss is much more desirable than the wildly wrong effects you would get from fixed-point overflow."

Practical Strategies for Handling Floats

The discussion also touches upon practical methods for managing floating-point numbers, including:

Visualizations: Tools that visually explain floating-point representations are praised for their educational value.
Canonical String Representations: The challenge of finding the shortest, unambiguous decimal representation of a float leads to discussions about algorithms like Dragon4 and Grisu3, and the use of std::numeric_limits<float>::max_digits10.
Serialization: Methods for reliably serializing and deserializing floating-point numbers are explored, with suggestions ranging from binary representations to hexadecimal float literals. The use of pointer casting for this purpose is noted as potentially leading to undefined behavior, with unions or memcpy being safer alternatives.

Overall, the thread showcases a deep understanding of both the infrastructural elements of the internet (TLDs) and the fundamental, often overlooked, challenges in numerical computation that underpin much of modern technology.