The unreasonable effectiveness of modern sort algorithms

This discussion on Hacker News revolves around the efficiency and practical application of sorting algorithms and data structures, touching upon concepts like premature optimization versus preemptive research, and the surprising effectiveness of certain techniques.

The Unreasonable Effectiveness of Modern Sorts and Hash Functions

A central theme is the appreciation for the advancements in the performance of everyday developer tools like sorting and hashing. Users express surprise and gratitude for how much simpler their work has become due to these improvements.

"The efforts of developing better sorting algorithms like driftsort/ipnsort and better hash functions like foldhash make my life as developer so much simpler. No matter how clever I try to be, most often just using foldhash hashmap or a sort_unstable is the fastest option" - conradludgate

This sentiment is echoed by the observation that real-world CPU performance often outpaces theoretical algorithm knowledge.

"Cool article. It's clear that all my theoretical algorithm-knowledge comes short when faced with real CPUs." - Epa095

The discussion also highlights how sorting can be a powerful, general-purpose "hack" for solving various problems.

"I feel that sorting data is the ultimate computer science hack. Many, many, classes of problems turn into O(log n) problems, once you sort your input in some way. It might not be the most effective way of solving the problem, but it's often a fairly good one." - lukaslalinsky
"So I'm really enjoying how good sorting algorithms are getting and how despite the O complexity remains mostly the same, the real computing efficiency is improving significantly." - lukaslalinsky

The Importance of Efficiency over Effectiveness (in this context)

A subtle but important distinction is made between "efficiency" and "effectiveness" when discussing algorithms. In this context, effectiveness refers to correctness, while efficiency refers to the speed and resource usage.

"Efficiency, not effectiveness. They are all effective in the sense that they produce sorted results. Even the non-modern sort algorithms are effective in the sense that the results are correct. This should be about the efficiency with which they do it, right?" - DennisL123
"Agreed. Effectiveness would imply that some algorithms are more likely to sort the list correctly than others, or they sort a higher percentage of elements. Efficiency is about factors external to the correctness" - aabhay

This distinction helps frame the core of the discussion: why do some well-established, theoretically sound algorithms seem to fall short in practice compared to newer, or even older, optimized implementations?

The Nature of Optimization: Premature vs. Pre-emptive Research

A significant portion of the conversation grapples with whether the exploration of niche performance improvements constitutes "premature optimization." The prevailing view, however, is that such work is more akin to valuable research or pre-emptive investigation.

"Isn't this just another case of premature optimization? Shouldn't you be adjusting sorting algorithms only when customer complains?" - dvh
"Also known as premature optimization. You had to literally invent new dataset just to show there is a difference. You are inventing problems, stop doing that!" - dvh

Counterarguments emphasize that this type of work is crucial for pushing boundaries and can lead to unexpected breakthroughs.

"Sometimes that is how useful jumps are made. Maybe someone will come along with a problem and the data they have just happens to have similar properties. Rather than premature optimisation this sort of thing is pre-emptive research - better to do it now than when you hit a performance problem and need the solution PDQ. Many useful things have come out of what started as “I wonder what if …?” playing." - dspillett
"It only makes sense to talk about premature optimization in the context of building a production system (or a standard library). This is research or experimentation, designed to improve our understanding of the behavior of algorithms. Calling it premature optimization makes no sense." - hyperpape

The article itself seems to suggest a balance, cautioning against optimizing for limited scenarios where it might introduce failure modes, while acknowledging that complacency can lead to a "leaning tower of hacks."

"I think the article basically had this conclusion. Think twice before optimising here because you may be able to squeeze something out for a very limited scenario but it can have ugly failure modes and it end up being slower in some cases. Plus it takes time and effort. And modern standard sorts are "unreasonably" fast anyway for many practical purposes. Then again only thinking of fixing things when a customer complains is a way to end up with a leaning tower of hacks which eventually ossify and also the customer (or rather the users, who may not be the customer especially in business software) may be putting up with dozens of niggles and annoyances before they bother to actually report one bug because they can't work around it." - grues-dinner

Rethinking Problem Structures: Beyond Global Ordering

The discussion touches on how problem structures can be reframed to avoid the cost of maintaining global order, such as using tournament selection or heaps instead of full sorts.

"I find in practice that if the sorting process is too slow, you should begin thinking about different ways to attack the problem. Maintaining a total global order of things tends to only get more expensive over time as the scope of your idea/product/business expands. The computational complexity of the sort algorithm is irrelevant once we get into memory utilization. This is why we have things like tournament selection. Randomly sampling from the population and running tournaments is way more scalable than scanning and ordering a global list each iteration. You can maintain things like an ELO score with very narrow views into memory. Nothing needs a global view yet you get global effects." - bob1029

This led to a request for practical examples and explanations of these alternative approaches.

"Could you give an example of reframing a problem from totally ordered complete data to a sampled tournament? I can imagine cases amenable to sampling, but since sampled data is smaller not sure why I wouldn't just sort it." - spiffytech
"I don't yet see how tournament selection could work here, could you explain? Sometimes when you think you need to maintain a sorted array under item insertion, it turns out that you only ever need to continually read the next-smallest (or next-largest) item -- and in that case, it suffices to maintain a heap, which is much cheaper." - akoboldfrying

The explanation of evolutionary algorithms and tournament selection illustrated how avoiding global ordering can improve scalability.

"An example would be an evolutionary algorithm that relies on a large population to maintain diversity. As you get into 6-7 figure population size, ranking the whole set starts to take a really long time (relatively speaking) each iteration. This also requires a serialized phase of processing that halts all workers for the duration. With tournament selection, you can randomly pick indexes from the population to build tournaments, which is effectively instant. There is no more requirement for a serialized processing phase. All processors can build their own random tournaments and perform updates of scores. There will be occasional conflict on score updates but the idea is that with enough samples/iterations it becomes very accurate." - bob1029

The Art of Low-Level Optimization and Unpacking Abstractions

The discussion includes a deep dive into low-level CPU optimizations, specifically using AVX2 instructions for efficient sorting or summing.

"Your "Branchless" approach could indeed be implemented very efficiently in a CPU with AVX2 (256-bit-wide vectors). With the current element in rax, the 4 valid values in ymm2, and the 4 running totals in ymm3 (initially zero), the inner loop would be just: VPBROADCASTQ rax,ymm1 VPCMPEQQ ymm1,ymm2,ymm1 VPADDQ ymm1,ymm3,ymm3 VPBROADCASTQ copies rax into each of the 4 lanes in ymm1. The VPCMPEQQ sets each qword there to all-0 ( = 0) or all-1 ( = -1) depending on the comparison result, so the VPADDQ will accumulate 4 running negative totals into ymm3, which can be negated afterwards. I would still expect the perfect hash function approach to be faster, though -- a similar number of operations, but 25% of the memory movement." - akoboldfrying

This technical exchange also involves clarifying the mechanics of the proposed instructions, highlighting potential misunderstandings about memory operations.

"“Memory movement”? None of the instructions you list involve memory." - Sesse__

The conversation also touches on the elegant simplification of certain problems, like identifying that specific comparisons can be reduced to inspecting the lowest bits of values.

"I find the perfect hash implementation a bit weird; it seems to obfuscate that you simply look at the lowest two bits (since they differ between the four values). You can do the x + 3 and 3 - expr at the very end, once, instead of for every element." - Sesse__

Memoization and Meme-ification in Tech Discourse

A recurring meta-theme is the playful imitation of well-known academic paper titles within the tech community, often for humorous effect.

"\"The Unreasonable Effectiveness of Mathematics in the Natural Sciences\" is one of those titles that gets imitated a lot for some reason. Maybe even more than \"Goto Considered Harmful\"." - creata
"Coming next: “What we talk about when we talk about modern sort algorithms”" - kwertyoowiyop
"or "we need to talk about what we talk about when we talk about the unreasonable effectiveness of title memes are all you need considered harmful"" - rcxdude

This highlights the shared cultural references and inside jokes that shape technical discussions.