Essential insights from Hacker News discussions

SQLite's File Format

SQLite's Pervasiveness and Design Philosophy

A strong theme in the discussion is SQLite's incredible success and pervasiveness, often attributed to its simple, single-file design, which makes it easy to use and integrate into a vast range of applications, from phones and AI to everyday devices. Despite perceived design shortcomings, it's frequently cited as the best "initial choice" for most projects.

  • "Usually SQLite will hit the maximum file size limit of the underlying filesystem or disk hardware long before it hits its own internal size limit." (mockingloris)
  • "Wondered the same thing. That's a lot of data for just one file! Did a full-day deep dive into SQLite a while back; funny how one tiny database runs the whole world—phones, AI, your fridge, your face... and like, five people keep it alive. Blows my mind." (mockingloris)
  • "SQLite is a great example of a single factor mattering more than everything else combined. A database contained in a single file is such a good idea that it outweighs a poorly designed storage layer, poorly designed column formats, and a terrible SQL implementation." (alphazard)
  • "If craftsmanship is measured by the long tail of good choices that give something a polished and pristine feel, then SQLite was built with none of it. And yet, it's by far the best initial choice for every project that needs a database. Most projects will never need to switch to anything more." (alphazard)
  • "Dr. Hipp has said several times that nobody expected a weakly-typed database to achieve the pervasiveness that is observed with SQLite." (chasil)
  • "Typically, the Lowest-Common-Denominator wins mass appeal/uasge. By not having safety checks and even typing enforcement, SQLite caters to actually more use cases than less." (alberth)

The Role of Documentation and Testing

The thorough documentation and extensive test suite of SQLite are highlighted as critical factors in its success, arguably more so than a theoretically "perfect" but less robust design. This emphasis on quality assurance and clear internal workings fosters trust and usability.

  • "I certainly do appreciate that the file format internals are so well documented here. It really reveals a lot of information about the inner workings of sqlite itself. I highly recommend reading it; I actually saved a copy for a rainy day sometime and it was very insightful and absolutely influenced some design decisions using sqlite in the future." (adzm)
  • "It apparently has an extensive and thorough test suite. That's an excellent design choice that tons of other projects could learn from, and is probably a key element of its success." (pmarreck)
  • "Sometimes a poorly-designed thing that is excellently-documented and thoroughly-tested is better than a brilliantly-designed thing that is lacking in those. In fact, unless the number of users of the thing is 1 (the creator), the former is likely a better option across all possible use-cases." (pmarreck)
  • "Perhaps we could generalize this by stating that determinism > pareto-optimality." (pmarreck)

SQLite as a Federal Standard and Versioning Concerns

The fact that SQLite's format is a U.S. federal standard (LOC) is noted, implying a constraint on its evolution. While one user suggests versioning up, others point out the significant complexity and maintenance burden this would introduce for the existing installed base, including API and SQL dialect compatibility, and maintaining DO-178B status.

  • "The format itself is a U.S. federal standard, and cannot be changed. That has advantages and drawbacks." (chasil)
  • "I assume the SQLite team could increment the version to 4 if they really needed to, and leave the LOC to update (or not) their recommendation, which specifies version 3." (justin66)
  • "However, a significant fraction of the current installed base would not upgrade, requiring new feature development for both versions. The test harness would also need implementations for both versions. Then the DO-178B status would would need maintenance for both. That introduces significant complexity." (chasil)
  • "Compared to the amount of SQLite database files in the world only few are shared between different applications. If there is an upgrade path most won't notice. The bigger issue imo is API and SQL dialect compatibility." (johannes1234321)

Database Size Limits, Filesystems, and Practicality

The theoretical maximum database size for SQLite is discussed, with users questioning whether anyone has ever actually hit this limit. The conversation shifts to the practicalities of modern storage, with users sharing examples of massive file systems and storage solutions that far exceed typical expectations, making the theoretical limits seem less impactful in the face of filesystem or hardware constraints. The cost of storage is also brought up, with users noting that large-scale storage is becoming increasingly accessible.

  • "The maximum size database would be 4294967294 pages at 65536 bytes per page or 281,474,976,579,584 bytes (about 281 terabytes)." (mockingloris)
  • "Usually SQLite will hit the maximum file size limit of the underlying filesystem or disk hardware long before it hits its own internal size limit." (mockingloris)
  • ""Usually"? I'm not saying there are literally no computers in existence that might have this much space on a single filesystem, but...has there ever been a known case of someone hitting this limit with a single SQLite file?" (saghm)
  • "The largest SQLite file I could find is 600GB https://www.reddit.com/r/learnpython/comments/1j8wt4l/workin... The largest filesystems I could find are ~1EB and 700PB at Oak Ridge." (dahart)
  • "FWIW, I took the ‘usually’ to mean usually the theoretical file size limit on a machine is smaller than theoretical SQLite limit. It doesn’t necessarily imply that anyone’s hit the limit." (dahart)
  • "That's just 10 30TB HDDs. Throw in two more for redundancy and mount them in a single zfs raidz2 (a fancy RAID6). At about $600 per drive that's just $7200. Half that if you go with 28TB refurbished drives (throw in another drive to make up for lost capacity). That is in the realm of lots of people's hobby projects (mostly people who end up on /r/datahoarder). If you aren't into home-built NAS hardware you can even do this with stock Synology or QNAP devices The limit is more about how much data you want to keep in sqlite before switching to a "proper" DBMS." (wongarsu)
  • "Last week I threw together a 840TB system to do a data migration. $1500 used 36-bay 4U, 36 refurbished Exos X28 drives, 3x12 RAIDz2. $15000 all in." (mastax)
  • "I don't use it for sqlite, but having multi-petabyte filesystems, in 2025, is not rare." (dmd)
  • "The kioxia lc9 is sold with capacities up to 245TB, so we are like 1 year max away from having a single disk with more than 281TB" (yread)

Weak Typing and Date/Time Handling

SQLite's weakly-typed nature is discussed as both a reason for its broad adoption and a potential drawback. While it caters to more use cases by being less opinionated, issues like date and time handling, which are punted to libc with limited facilities, are identified as a persistent challenge that can complicate updates and consistency.

  • "Dr. Hipp has said several times that nobody expected a weakly-typed database to achieve the pervasiveness that is observed with SQLite. At the same time, strict tables address some of the concern of those coming from conventional databases. Dates and times are a core problem to SQLite not seen elsewhere as far as I know, but this does evade UTC and constantly shifting regional time. My OS gets timezone updates every few months, and avoiding that had foresight. Default conformance with Postel's Law is SQLite's stance, and it does seem to work with the ANSI standard." (chasil)
  • "While nobody expected it … it should not be unexpected. Typically, the Lowest-Common-Denominator wins mass appeal/uasge. By not having safety checks and even typing enforcement, SQLite caters to actually more use cases than less." (alberth)
  • "SQLite probably doesn't do anything with times and dates except punting some functions to the limited libc facilities because including any proper date-time facilities would basically double the footprint of SQLite. Same for encodings and collations." (formerly_proven)
  • "I think it's one of the reasons DuckDB has seen the popularity that it has." (da_chicken)
  • "It was designed to be a DB for Tcl at a time when that language didn't have typed objects. Its SQL implementation reflects that. Where are the grand Python, or Perl, or JS DBs?" (kevin_thibedeau)
  • "Is this opinion shared by others?" (Jabbles)

Alternative Formats and Network Storage

The discussion touches upon alternative data formats like Parquet and their advantages, particularly in cloud-native or big data contexts. However, the compatibility and challenges of using SQLite on network storage, especially with read-write operations, are also explored, with varying user experiences and official recommendations for caution.

  • "The neatest thing i seen is you can put a sqlite db on a http server and read it effectively using range requests" (cyanydeez)
  • "I'm not saying there are literally no computers in existence that might have this much space on a single filesystem, but...has there ever been a known case of someone hitting this limit with a single SQLite file?" (saghm)
  • "In my experience, this works when the db is read only. And in these read only cases I'd use Parquet files queried with Duckdb Wasm." (simlevesque)
  • "Sometimes I ask myself with we could do a better file format, something like parquet but row-oriented" (SchwKatze)
  • "Any recommendations from HN for a write-once (literally once), data storage format that's suitable for network storage? sqlite docs recommend avoiding using it on network storage, though from what I can gather, it's less of an issue if you're truly only doing reads (meaning I could create it locally and then copy it to network storage). Apache Parquet seems promising, and it seems to support indexing now which is an important requirement." (kayson)
  • "SQLite works fine over read-only NFS, in my experience. Just only work on an immutable copy and restart your application if ever changing it. If your application is short lived and can only ever see an immutable copy on the path, then it is a great solution." (mcculley)
  • "SQLite does work on NFS even in read-write scenario. Discovered by accident, but my statement still holds. The WAL mode is explicitly not supported over network filesystems, but I guess you don't expect it to :)" (nasretdinov)
  • "My experience has been the opposite... Lots of db lock and corruption issues. The FAQ doesn't call out WAL specifically, just says don't do it at all: https://www.sqlite.org/faq.html#q5" (kayson)
  • "Parquet files are what I use." (simlevesque)
  • "Multiple writers on network storage is the issue. Reading should be totally fine." (pstuart)

Page Size Representation and Hardware Specifics

A minor point of discussion arises around the odd design choice of representing SQLite's page size in bytes, which must be a power of two. This leads to a brief consideration of why a logarithmic representation might be more logical, and the potential for future implementations to handle non-power-of-two sector sizes on some hardware.

  • "What an odd design choice. Why not just have the value be the base 2 logarithm of the page size, i.e. a value between 9 and 16?" (lisper)
  • "If I had to guess this field was specified before page sizes of 65536 were supported. And at that point using the value 1 for page sizes of 65536 made the most sense." (kevincox)
  • "There exists hardware with non-power-of-two disk sector sizes. Although sqlite's implementation requires powers-of-two today, a future implementation could conceivably not. Representing 64k was presumably an afterthought." (Retr0id)

Historical Context and Other Databases

The discussion briefly touches on historical database systems, such as DEC's Rdb, which also ran as a single file and introduced concepts like cost-based optimization. This provides a comparative perspective on the evolution of database technology and the enduring appeal of simpler data storage models.

  • "Digital Equipment Corporation sold a SQL database known as Rdb that could also run as a single file. It was the first database to introduce a cost-based optimizer, and ran under both VMS and Digital UNIX. Oracle bought it, and VMS versions are still supported." (chasil)
  • "Oh! RDB was the first database I worked with. I forgot all about it. I do remember refactoring the data layer so that it also worked with Berkeley DB, which is also owned by Oracle now. Or maybe it was the other way around? There was no SQL involved in that particular application so it was just a K/V store. Working with a local data file was the primary design goal, no client/server stuff was even on the radar. SQLite would have been perfect if it had existed." (owyn)
  • "Firebird also fits the bill, I think, but never took off. Firebird even supports client-server deployments." (christophilus)