Essential insights from Hacker News discussions

Netflix Revamps Tudum's CQRS Architecture with Raw Hollow In-Memory Object Store

The Hacker News discussion revolves around the perceived over-engineering of the infrastructure for Netflix's Tudum website. A central theme is the debate about whether the complexity of the system is justified by its actual requirements or if it stems from other factors.

Perceived Over-Engineering and Lack of Clear Justification

A significant portion of the discussion expresses skepticism about the complexity of the Tudum infrastructure, especially considering its function as a content website, often compared to a "WordPress powered PR blog." Many users find the extensive use of technologies like Cassandra, Kafka, and Kubernetes to be excessive for the task at hand.

  • "Am I naive thinking this infra is overblown for a read-only content website?" - thinkindie
  • "From an outsiders perspective Tudum does seem to be an extremely simple site... But maybe they have complicated use cases for it? I'm also not convinced it merits this level of complexity" - rokkamokka
  • "I cannot imagine why you'd need a reactive pipeline built on top of Kafka and Cassandra to deliver some fanservice articles through a CMS, perhaps some requirement about international teams needing to deliver tailored content to each market but even with that it seems quite overblown." - piva00
  • "At least based on my experience. Most of the tech infrastructure out there is over engineered." - dakiol
  • "This is the only reasonable take in your rant, but the reasoning is off for even this. They have little reason, because they will never hit the scale Netflix operates at. In the very very odd chance they do, they will have ample money to care about it." - gf000
  • "I don't know any of the details, but they seem to have moved a lot of their internal stuff to Hollow. So maybe it's just an attempt at unification of the tech stack, rather than a concrete need." - gf000
  • "I don’t normally comment on technical complexity at scale, but having used Tudum, it is truly mind boggling why they need this level of complexity and architecture for what is essentially a WP blog equivalent." - yunohn
  • "Holy shit the amount of overcomplications to serve simple HTML and CSS. Someone really has to justify their job security to be pulling shit like this, or they really gotta be bored." - thecupisblue
  • "This has to be one of the most over-engineered websites out there." - moralestapia
  • "Isn't Tudum mostly a static site? It must be a great project to try out cool stuff on, with a near zero chance of that cool stuff making it to the main product and having a significant impact on customers. Most of the traffic probably comes from bots." - mkl95
  • "Concerning that their total uncompressed data size including full history is only 520MB and they built this complex distributed system rather than, say, rsyncing an sqlite database." - immibis

Microservices Justification: Organizational vs. Performance

A recurring point of discussion is the rationale behind using microservices. While some associate them with performance, others emphasize their role in addressing organizational challenges, such as enabling independent deployment and faster iteration for teams.

  • "Microservices for organizational challenges. Lots of people think microservices = performance gains only. It’s not. It’s mainly for organizational efficiency. You can’t be blocked from deploying fixes or features. Always be shipping. Without breaking someone else’s code." - moomoo11
  • "In fact, it's opposed to performance in many cases. And dev velocity for small teams." - boxed
  • "In the end it will be a technical solution to an organisational issue, some parts of their infrastructure might be rigid and there are teams working around that instead of with that..." - piva00

Potential Underlying Complexities and Justifications

Despite the general sentiment of over-engineering, some users suggest potential reasons for the complex setup, including personalization requirements, handling content updates, and the integration with existing, possibly rigid, infrastructure.

  • "I’m gonna take a wild guess: the actual problem they’re engineering around is the “cloud” part of the diagram (that the “Page Construction Service” talks to) There is probably some hilariously convoluted requirement to get traffic routed/internal API access. So this thing has to run in K8s or whatever, and they needed a shim to distribute the WordPress page content to it." - pram
  • "Reading the article I got the impression the big challenge is doing "personalzation" of the content at scale. If it were "just" static pages, served the same to everyone, then it's pretty straightforward to implement even at the >300m users scale Netflix operates at. If you need to serve >300m different pages, each built in real-time with a high-bar p95 SLO then I can see it getting complicated pretty quickly in a way that could conceivably justify this level of "over engineering"." - Joe8Bit
  • "From what I’ve seen at workplace is that trip to this event is equivalent of corporate junket for mid-level developers who happened to be manager's favorite." - geodel (This quote is a tangent about Re:Invent, but it touches on potential motivations for complex projects).
  • "This is the only reasonable take in your rant, but the reasoning is off for even this. They have little reason, because they will never hit the scale Netflix operates at. In the very very odd chance they do, they will have ample money to care about it." - gf000 (This comment, while critical, suggests scale as a potential, albeit unlikely, justification).
  • "CHARACTERIZING NETFLIX AS A "READ-ONLY" WEBSITE IS INCREDIBLY SHORTSIGHTED CONSIDERINGS: - a constantly changing library across constantly changing licensing regions available in constantly changing languages - collaborative filtering with highly personalized recommendation lists, some of which you just know has gotta be hand-tuned by interns for hyper-demographic-specific region splits - the incredible amounts of logistics and layers upon layers of caching to minimize centralized bandwidth to serve that content across wildly different network profiles i think that even the single-user case has mind boggling complexity, even if most of it boils down to personalization and infra logistics." - gcr
  • "This sort of personal opinion reads like a cliche in software development circles: some rando casualy does a drive-by system analysis, cares nothing about requirements or constraints, and proceeds to apply simplistic judgement in broad strokes. And this is then used as a foundation to go on a rant regarding complexity. This adds nothing of value to any conceivable discussion." - motorest
  • "If you read the overview, Tudum has to support content update events that target individual users and need to be templated. How do you plan on generating said HTML and CSS? If you answer something involving a background job, congratulations you're designing Tudum all over again. Now wait for opinionated drive-by critics to criticize your work as overcomplicated and resume-driven development." - motorest
  • "The common solution is to spin up a dedicated DNS hostname called something like "preview.www.netflix.com" and turn off all caching when users go via that path. Editors and reviewers use that, and that's... it. Solved!" - jiggawatts (This is a counter-proposal to the complexity for previewing changes).
  • "When new people join your team and learn your infrastructure, I bet they often ask ”why is this so complicated? It’s just a .” And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”" - shermantanktop

Motivations Beyond Technical Necessity

Some users speculate that motivations other than pure technical requirements might be at play, such as building developer CVs, showcasing new technologies, or accommodating a large workforce of engineers.

  • "Alternative idea: the actual problem they’re engineering around is their developers CVs" - sunrunner
  • "Doing weird pointlessly complicated stuff on a niche area of your website is a not entirely ridiculous way to try out new things and build new skills I guess." - __alexs
  • "Not naive but perhaps missing that army of enterprise Java developers that Netflix employ do seem to need to justify their large salaries by creating complex architecture to handler future needs." - geodel
  • "Many expensive big ego engineers that want to feel useful with PMs to match." - Voultapher
  • "Mom the astronauts are back at it again" - porridgeraisin

Comparisons and Criticisms of Past Practices

The discussion also touches on past experiences with Netflix's technical presentations and perceived patterns of over-engineering, drawing parallels to previous events and technologies.

  • "I remember being interested in their architecture when I attended re:Invent in 2018... All four speakers ran the exact same slide deck with a different intro slide. All four speakers claimed the same responsibility for the same part of the architecture. I was livid. I also stopped attending talks in person entirely because of this, outside of smaller more focused events." - busterarm
  • "That is because they are and it seems that since they're making billions and are already profitable, they're less likely to change / optimize anything. Netflix is stuck with many Java technologies with all their fundamental issues and limitations. Whenever they need to 'optimize' any bottlenecks, their solution somehow is to continue over-engineering their architecture over the most tiniest offerings (other than their flagship website)." - rvz
  • "This brings back old memories, from when they released the first version of Hollow (2016, I think) and I started writing a port in .Net because I thought it would be useful for some projects I was working on." - spyrefused

Misconceptions about Scale and Tudum

There's a notable point of confusion regarding whether the discussion is about the core Netflix streaming service or the Tudum website itself. Several users clarify that Tudum is a separate, significantly smaller-scale entity, which further fuels the criticism of over-engineering.

  • "They are not talking about the Netflix steaming service. https://www.netflix.com/tudum This is a site they are talking about which is very similar to a WordPress powered PR blog." - miyuru
  • "This blog about an architecture change is about the Tudum website specifically, not the whole of Netflix." - rpsw
  • "And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.” It's 500 MB of text. A phone could serve that at the scale we're talking about here, which is a PR blog, not netflix.com." - jiggawatts
  • "Also this isn't "Netflix scale", Tudum is way less popular." - thecupisblue
  • "I hadn't even heard of it until today." - sunrunner
  • "I thought the article was about some internal Netflix piece of infra (well ... it is in some way) but it really is some website for some annual event ... wow." - moralestapia