Essential insights from Hacker News discussions

Ultra Ethernet Specification v1.0 [pdf]

Ultra Ethernet: Aiming to Replace Infiniband for HPC and AI

The primary theme revolves around Ultra Ethernet (UE) as a potential successor to Infiniband (IB) and RoCE (RDMA over Converged Ethernet) for high-performance computing (HPC) and AI workloads. Commenters see it as a more refined approach to RDMA on Ethernet, addressing the shortcomings of RoCE.

  • frantathefranta: "Skimming this, it looks like it's trying compete with Infiniband, but combining it with plain Ethernet?"
  • jauntywundrkind: "That so much of the document is about how UltaEthernet maps to Libfabric rather confirms that premise! There's also a credit based flow system, and connection manager roles, which are also easily identifiable Infiniband concepts."
  • jpgvm: "Ultimately it's just a different (IMO slightly more refined) look at how to support RDMA on Ethernet vs RoCE which is a more ham fisted implementation."
  • datadrivenangel: "Ethernet but for AI!"
  • bobmcnamara: "Γ†thernet!"
  • dist-epoch: "Ethernet + AI"

Technical Differences Between Ultra Ethernet and RoCE

The discussion touches upon the technical differences between Ultra Ethernet and RoCE, emphasizing UE's use of credit-based flow control, similar to Infiniband, for better predictability and network utilization. RoCE is seen as relying more on Explicit Congestion Notification (ECN).

  • jauntywundrkind: "I think predictability/utilization/latency in RoCE are worse, that it relies on Explicit Congestion Notification more for flow control. Where-as UE is using Infiniband style credit based flow control, which should insure that any data sent has sufficient throughout allocated to it to be received."
  • jpgvm: "RoCE took an encapsulation approach that has some drawbacks (namely it's reliance on PFC/ECN for congestion management)."
  • jpgvm: "This takes a different approach that attempts to actually do first-class re-implementations of Infiniband-ish congestion control with end-to-end credit based flow control similar to Infiniband virtual lanes."

Ultra Ethernet Consortium and Timing

The discussion addresses the recent formation of the Ultra Ethernet Consortium (UEC) and clarifies its relationship to the rise of AI. While the UEC was formed in July 2023, the need for high-performance networking predates the recent AI boom. The motivation for highlighting AI is seen as a means to gain broader acceptance.

  • throw0101d: "UEC was formed/announced in July 2023...ChatGPT was launched in November 2022" linking to press releases and wikipedia articles.
  • jpgvm: "UltraEthernet predates the AI boom. I don't blame them for adding the AI buzzwords, for this to succeed people need to think about it in the same vein as IB/RoCE/etc and a lot of the decision makers are unfortunately not technical enough to understand what this is or does."
  • altairprime links to a press release and quotes it: "Modern RDMA for Ethernet and IP – Supporting intelligent, low-latency transport for high-throughput environments...Open Standards and Interoperability – Avoids vendor lock-in while accelerating ecosystem-wide innovation...End-to-End Scalability – From routing and provisioning to operations and testing, UEC scales to millions of endpoints."

History of Infiniband and Related Technologies

The conversation explores the history of Infiniband, including Intel's involvement through both native IB switches and Omni-Path, and the consolidation of the IB market around NVIDIA/Mellanox.

  • throw0101d provides links to historical intel infiniband switch documentation & Infiniband alliance member listing.
  • jauntywundrkind: "Intel's Omni-Path was very similar but a little different from Infiniband. Anyone remember any details on how; I'm forgetting?"
  • rincebrain: "Intel bought Qlogic's IB division when they sold it off, in 2012; I believe at one point the former QLogic parts were branded Omni-Path before it diverged. ... Before that, Mellanox ate Voltaire, who was the other large vendor in the IB space. So at this point, I believe NVIDIA's Mellanox devices are the only people selling IB chips these days..."

10GBASE-T and Beyond for Office Environments

A portion of the discussion shifts to the adoption of 10GBASE-T and future networking needs in office environments. The consensus is that 10GBASE-T is sufficient for most office use cases for the foreseeable future, with higher speeds primarily targeted at specialized applications. Concerns about cost, power consumption, cable requirements, and deployment complexity are raised for faster Ethernet standards.

  • lousken: "This is cool, but I am more curious about what happens to all the cat 6a cabling in offices, where do we go after 10GBASE-T"
  • BizarroLand: "10g is going to remain sufficient for office use for a long time...Debbie in accounting will practically never need more than 10g networking to do her job."
  • crote: "Perhaps in 2050 Edward will be mailing Debbie iterations of their 500GB AI fraud model for local inference?...The question isn't if 10G will need to be replaced with something better, but when."
  • crote: "25GBASE-T and 40GBASE-T dead-on-arrival: the spec exists, but nobody ever bothered to actually ship it."
  • phonon: "The Realtek RTL8127 chip for 10GbE will cost $10, draw 1.95 W and is designed for motherboards. So I think you're a bit pessimistic on timelines..." linking to an article, but also ceding that wideropread takeup is still a ways off.

Availability and Pricing

Finally, there's a brief inquiry and speculation about the timeline for hardware availability and the potential cost of Ultra Ethernet solutions compared to Infiniband and RoCE. The suggestion is that since Ultra Ethernet is primarily a software addition on existing physical layers, hardware support should be available relatively quickly.

  • anonymousDan: "So how long before hardware is available that supports this spec? Would the kit likely be cheaper than infiniband (or even ROCE) equivalents?"
  • bgnn: "It seems there isn't much difference on physical layer with standard 802.3 100Gb/lane. So in principle this is a software addition. Hardware supporting it should be available quickly."