Essential insights from Hacker News discussions

Poor man's bitemporal data system in SQLite and Clojure

This Hacker News discussion revolves around the concept of bitemporality in databases, exploring its benefits, challenges, and practical implementations.

The Value and Complexity of Bitemporality

A central theme is the recognition that bitemporality, the ability to query data based on both when a fact was recorded (transaction time) and when it was valid (valid time), addresses common and complex real-world data needs. Users expressed that such queries, like "What was the P&L for March using the data available on Apr 4?", are surprisingly infrequent in standard database designs.

  • moomin stated, "It's amazing how often you want to know 'What was the P&L for March using the data available on Apr 4?' and how uncommon it is to find a database design that supports that kind of query."
  • refset elaborated on the broader issues bitemporality solves: "AsOf join in those systems solves a rather narrow problem of performance and SQL expressiveness for data with overlapping user-defined timestamps. The bitemporal model solves much broader issues of versioning and consistent reporting whilst also reducing the need for many user-defined timestamp columns." They further clarified that in a bitemporal database, "every regular looking join over the current state of the world is secretly an AsOf join (across two dimensions of time), without constantly having to think about it when writing queries or extending the schema."

Bitemporality vs. As-Of Joins in Modern Databases

The discussion also highlights a distinction between bitemporality and the "AsOf" join functionality increasingly found in modern analytical databases. While AsOf joins can handle "transaction time" or "valid time" in specific scenarios, they are seen by some as solving a more limited problem compared to a full bitemporal model.

  • necubi listed several databases supporting AsOf joins, including "Clickhouse, DuckDB, BigQuery, Snowflake, QuestDB."
  • refset countered, "AsOf join in those systems solves a rather narrow problem of performance and SQL expressiveness for data with overlapping user-defined timestamps. The bitemporal model solves much broader issues of versioning and consistent reporting whilst also reducing the need for many user-defined timestamp columns."

Clojure's Role and Ecosystem

The initial comment about Clojure being a "hermetic space" sparked a debate about the language's ecosystem and its influence on temporal data concepts. While some felt Clojure was isolated, others pointed to its extensibility and influence.

  • moomin remarked, "It's a pity that Clojure is kind of a hermetic space these days, because the concept of bitemporality really deserves much more attention."
  • lemonberry asked for clarification: "If you don't mind sharing, I'm curious why you feel this way."
  • 0_gravitas offered a counterpoint to "hermetic," noting Clojure's tendency to attach to other ecosystems: "'Hermetic' is an interesting word-choice, considering Clojure has a habit/raison d'etre of attaching to other host languages/ecosystems i.e. Jank (C++), ClojureScript (JS), Basilisp (Python), Phel (PHP), Babashka (general scripting), and of course, Clojure itself on the JVM."
  • adityaathalye later reinforced the community interest: "Really interesting stuff. I learned a lot about this from Clojure/Datomic and think its quite neat that so many Clojurists are interested in and tackling this problem. As the author notes in this post, XTDB is another one."

Practical Implementations and Challenges

Several users shared their real-world experiences implementing temporal data systems, revealing the practical difficulties and lessons learned. These ranged from using PostgreSQL with tstzrange to building append-only log systems.

  • whalesalad detailed a PostgreSQL-based bitemporal system for managing domain names, involving change tables and time travel records. They noted, "All of it works quite well and is decently performant. We can ask questions like, how many domains did we own on March 13th, 2024? Or look at the entire lifecycle of a domains ownership..." However, they also identified significant challenges: "The big challenge and core issue we discovered though is that our data sucks. QAing this new capability has been a moving target. Tons of mistakes over time that were partially undone or manually undone without proper audit trail. Ghost records. Rapid changes by our bulk editor tool a->b->a->b that need to get squashed into just a->b." This led to a desire for "a much better change tracking system that bakes in tools to clean/erase/undo/soft-delete/hard-delete mistakes."
  • adityaathalye advocated for an "everything is a log" approach: "Writes should either be traditional schema -> standard log of all changes, OR, should be 'everything is a log', and the system keeps the 'current' view updated, which is just a special case of the 'as of' query, where 'as of' is always 'now'." He contrasted this with traditional current databases: "The complexity of Current DBs derives from /live process management/ challenges. Whereas the complexity of Temporal DBs derives from /retroactive process management/ challenges."
  • whalesalad agreed with the "everything is a log" philosophy: "100%. This is a regret that I have in our lifecycle tracking. We effectively are updating a giant spreadsheet (table) and tracking side effects of those updates. I would much rather work in a log/event based system kinda flipping that on its head, where we track events like domain_renewed or domain_transferred."

Philosophical Debates on Data Modeling

The discussion touched upon fundamental differences between traditional CRUD-based systems and log-centric temporal systems, exploring where complexity truly lies and the trade-offs involved.

  • adityaathalye presented a diagrammatic explanation of the pivot in data modeling: "In both cases the current view can be understood to be a pivot table of the facts log. BUT in the current DB, we must 'pivot' the /process/, i.e. take a CRUD op and transform it into an audit log. This /must/ be done synchronously in real-time. Whereas in the Temporal DB's case, we must 'pivot' the stored data, which we can do at any point in query time, as of any point in time of the log." He concluded, "The complexity of Current DBs derives from /live process management/ challenges. Whereas the complexity of Temporal DBs derives from /retroactive process management/ challenges."
  • mamcx raised concerns about the "everything is a log" approach, emphasizing the benefits of traditional main tables and triggers: "The problem with 'everything is a log' is that is very undisciplined, and trigger from the actual main table have this very serious advantages: * Your main table has the correct shape, and the derived log too! * MOST of the queries are to that and for now"

Resources and Future Directions

The conversation also pointed to valuable resources for those interested in bitemporality, including interviews and ongoing research.

  • adityaathalye shared a link to XTDB discussions and interviews with researchers like Rob Squire and Richard Snodgrass.
  • refset provided an unlisted video link to an interview with Professor Snodgrass.

Skepticism Towards Bitemporality

Finally, a dissenting opinion emerged, suggesting that bitemporality might not be universally applicable or desired.

  • thom offered a contrasting view: "Perhaps because of the domains I work in I continue to find bitemporality completely uninteresting. Stop trying to make fetch-as-of happen."