Here's a summary of the themes from the Hacker News discussion:
The Perceived Redundancy and Misuse of DISTINCT
A significant portion of the discussion revolves around the DISTINCT
keyword in SQL. While some users see it as a simple syntactic shortcut or a way to achieve a specific result, many view it critically, often associating it with poor query design and a lack of understanding of the underlying data or schema.
- "Not ashamed to admit that I never really thought about the distinct operator 'being redundant' as its essentially just a group by." - potatoproduct
- "distinct has always felt like a query smell to me. Too many junior analysts abusing it because they don't know the schema well and are over-joining entities" - morkalork
- "DISTINCT is often a smell at the head (or middle) of a complex query as you are throwing away processed information, sometimes a lot of it, late in the game. Much better to filter it out earlier and not process it further, where possible." - dspillett
- "SELECT DISTINCT is often a code smell. (Not always.) If you see it, there’s a 70% chance it got slapped on to fix an issue that should have been solved a different way." - paulddraper
- "For a question worth 2 points, if you use the word "DISTINCT" when it wasn't needed, you lose 0.5 points. - If you don't use "DISTINCT" when it was necessary, you lose all 2 points." - aspaviento (referencing a teacher's strict rules)
The Importance of Proactive Filtering and Early Optimization
Closely related to the DISTINCT
discussion is the idea that filtering and data reduction should happen as early as possible in a query. This is seen as more efficient and likely to leverage indexes, ultimately leading to better performance.
- "DISTINCT is often a smell at the head (or middle) of a complex query as you are throwing away processed information, sometimes a lot of it, late in the game. Much better to filter it out earlier and not process it further, where possible. Filtering earlier, as well as reducing waste processing time (and probably memory use), increases the chance of the query planner being able to use an index for the filter which could greatly decrease the IO cost of your query." - dspillett
- "Sometimes the number of joins is fine but they don’t understand the data properly and should be spending more time understanding why multiple rows are being returned when they expect one (eg they need to filter on an additional field)." - ryanjshaw
The Nature of SQL: Declarative vs. Imperative and the "Black Box" Aspect
There's a debate about the true nature of SQL and how much it "hides" or reveals about its execution. Some argue it's declarative and hides execution details, while others contend that understanding the underlying execution is crucial for performance, making it a "leaky abstraction."
- "SQL is a declarative language, and will produce exactly what you tell it to, provided you understand what it is you asked it to do. The query engine is somewhat of a black box, but that is completely orthogonal." - sgarland
- "SQL is a declarative language so it —- by definition —- hides the execution. Not really sure what you’re trying to argue here." - paulddraper
- "SQL hiding the execution is an extremely leaky abstraction. To get the performance you need, you have to plan your possible queries in advance together with how to structure the data." - crazygringo
- "Ha ha, no, SQL implementations can conform to the standard in unexpected ways. NULL = NULL Is that true or false? We didn't know until 2003." - chasil (highlighting unexpected SQL behavior)
The Value of Data Structures and Database Internals Over SQL Syntax
One contributor suggests that developers should focus less on mastering SQL syntax and more on understanding data structures and how databases organize and index data. The argument is that a solid understanding of these fundamentals, rather than the SQL language itself, is more valuable for designing performant systems.
- "The set of data structures that you use to model and index a dataset is worth understanding, and designing in that space is a skill worth learning." - alphazard
- "You can imagine a database as a bunch of trees, hashmaps and occasionally other stuff, protected by a lock. First you acquire the lock, then you update some of the data structures, like indexes... By default, most data will live in a BTree with an integer primary key... Indexes will be BTrees where the key is stuff you want to query on, and the value is the primary key in the main table." - alphazard
- "belfthrow: Java is a bad language for this compared to go? Is this legitimate advice on a serious programming blog. Pretty unbelievable honestly." (This is a reaction to the advice on learning data structures via languages like Go).
- "sgarland: Frankly, this is terrible advice. If you’re not designing your data model around the language it’s going to be queried in, how do you expect to get decent performance out of the database?" - sgarland
Debugging and Understanding Query Execution
The discussion touches on how developers debug SQL queries. Compared to imperative languages where console.log
is common, debugging SQL often involves breaking down queries into smaller parts, using subqueries or Common Table Expressions (CTEs), to inspect intermediate results.
- "In an imperative language if you don’t yet understand (or make a typo, or whatever), you can just print/console.log and find out." - halfcat
- "The equivalent of
print
/console.log
in SQL would be using subqueries/CTE and run them to see the intermediate result (just likeprint
/console.log
show you intermediate results of the executions in an imperative language)." - SkiFire13
Database-Specific Syntax and Learning Curves
There's a brief mention of how database-specific syntax or features can impact query writing and understanding, implying that a "one-size-fits-all" approach to SQL learning might not be entirely accurate due to these variations.
- "We use Oracle syntax and write
instead of ANSI SQL AS . Ditto for table expressions" - datadrivenangel - "jslaby: T-SQL can handle that alias expr just fine, but the seqNum returned is 4,8,12,16,20 instead of the 1,2,3... I tried on MySQL and it works fine. I'm not sure how SQL Server is handling those cartesian joins differently"
The Power and Beauty of SQL for Business Logic
A counterpoint is raised, emphasizing that SQL, despite its complexities, is powerful for implementing intricate business logic concisely. The emergence of SQL macros further enhances its capability for reusable patterns.
- "SQL is beautiful in its own way. Definitely not easy to master but beautiful in how much business logic you can implement in not many lines of code." - phartenfeller
- "And with SQL macros becoming a thing it is now easily possible to store patterns as reausable functions with parameters." - phartenfeller