GigaByte CXL memory expansion card with up to 512GB DRAM

The discussion revolves around the introduction of a new CXL (Compute Express Link) memory expansion card, prompting a wide range of opinions and technical discussions. Here are the key themes:

Excitement about Increased Memory Capacity

The discussion begins with users expressing enthusiasm for the substantial increase in memory capacity offered by the new CXL technology. The ability to go beyond typical consumer limits of 32GB or 64GB to 512GB or even terabytes is seen as a significant advancement.

roscas: "That is amazing. Most consumer boards will only have 32 or 64. To have 512 is great!"
cjensen: "Both of the supported motherboards support installation of 2TB of DRAM."

PCIe Bandwidth as a Key Factor

A recurring point of interest is the role of PCIe bandwidth in CXL's performance. Users are keen to understand how the speed of PCIe lanes, particularly PCIe 5.0, will impact memory access and, consequently, application performance, especially for tasks like AI inference.

reilly3000: "Presumably this is about adding more memory channels via pcie lanes. I’m very curious to know what kind of bandwidth one could expect with such a setup, as that is the primary bottleneck for inference speed."
Dylan16807: "The raw speed of PCIe 5.0 x16 is 63 billion bytes per second each way. Assuming we transfer several cache lines at a time the overhead should be pretty small, so expect 50-60GB/s. Which is on par with a single high-clocked channel of DRAM."

CXL's Potential for Shared Memory and Enhanced Cluster Computing

Several users highlight the transformative potential of CXL for building shared memory systems and improving cluster computing. The ability for multiple nodes to access a single, cache-coherent dataset in shared CXL memory is seen as a significant advantage for data-intensive applications like large hash joins.

tanelpoder: "So you could, for example, cache the entire build-side of a big hash join in the shared CXL memory and let all other nodes performing the join see the single shared dataset. Or build a „coherent global buffer cache“ using CPU+PCI+CXL hardware, like Oracle Real Application Clusters has been doing with software+NICs for the last 30 years."
tanelpoder: "CXL arrays might become something like the SAN arrays in the future - with direct loading to CPU cache (with cache coherence) and being byte-addressable."

Cost of High-Capacity Memory and Early Adoption Challenges

The cost of high-capacity DDR5 RDIMMs and CXL memory modules is a significant concern. Users note that these components are currently very expensive, potentially limiting widespread adoption, and that many initial use cases might involve leveraging existing DDR4 memory. There's also a sentiment that CXL has been "all announcements and predictions" for years, with limited actual consumer availability.

justincormack: "You havent seen the price of 128GB DDR5 RDIMMs, they are maybe $1300 each."
kvemkon: "Micron DDR5-5600 for 900 Euro (without VAT, business)."
tanelpoder: "Yeah I saw the same. I've been keeping an eye on the CXL world for ~5 years and so far it's 99% announcements, unveilings and great predictions. But the only CXL cards a consumer/small business can buy are some experimental-ish 64GB/128GB cards that you can actually buy today. Haven't seen any of my larger clients use it either."

CXL as a Cleaner Alternative and Potential for Memory Tiering

CXL is viewed as a more elegant solution compared to older methods of extending memory capacity, such as using FPGAs. Its ability to facilitate memory tiering is also recognized as a key benefit, allowing for more efficient use of different memory types.

bobmcnamara: "CXL seems so much cleaner than the old AMD way of plumbing an FPGA through the second CPU socket."
Twirrim: "On the positive side, you can scale out memory quite a lot, fill up PCI slots, even have memory external to your chassis. Memory tiering has a lot of potential."

Latency Considerations and Performance Trade-offs

The added latency introduced by CXL is a significant factor that users discuss. While acknowledging that CXL memory is much faster than SSDs, the latency compared to on-motherboard RAM is a concern that requires careful consideration in application design.

Twirrim: "On the negative side, you've got latency costs to swallow up. You don't get distance from CPU for free (there's a reason the memory on your motherboard is as close as practical to the CPU) ... CXL spec for 2.0 is at about 200ns of latency added to all calls to what is stored in memory, so when using it you've got to think carefully about how you approach using it, or you'll cripple yourself."
GordonS: "Huh, 200ns is less than I imagined; even if it is still almost 100x slower than regular RAM, it's still around 100x faster than NVMe storage."
tanelpoder: "Yup, for best results you wouldn't just dump your existing pointer-chasing and linked-list data structures to CXL... But CXL-backed memory can use your CPU caches as usual and the PCIe 5.0 lane throughput is still good, assuming that the CXL controller/DRAM side doesn't become a bottleneck. So you could design your engines and data structures to account for these tradeoffs."

CXL Ecosystem and Hardware Support

The discussion touches upon the hardware ecosystem supporting CXL, including CPUs from Intel and AMD, as well as memory chips and switching hardware from companies like Marvell and Samsung. The physical implementation of CXL networking is noted as still being in the R&D stage.

afr0ck: "Meta has already rolled out some CXL hardware for memory tiering. Marvell, Samsung, Xconn and many others have built various memory chips and switching hardware up to CXL 3.0. All recent Intel and AMD CPUs support CXL."
afr0ck: "CXL uses the PCIe physical layer, so you just need to buy hardware that understands the protocol, namely the CPU and the expansion boards. AMD Genoa (e.g. EPYC 9004) supports CXL 1.1 as well as Intel Saphire Rapids and all subsequent models do."
wmf: "CXL networking is still in the R&D stage."

Novel Use Cases and Adaptability of Existing Software Stacks

Users speculate on various use cases, including using CXL memory as a "gigantic swap space" or for direct data transfers between CXL devices and GPUs. There's also a belief that database management systems (DBMSs) are well-positioned to adapt to CXL's tiered memory structure due to their historical experience with managing storage with different access times.

samus: "DBMSs have been managing storage with different access times for decades and it should be pretty easy to adapt an existing engine. Or you could use it as a gigantic swap space."
cr8: "pcie devices can also do direct transfers to each other - if you have one of these and a gpu its relatively quick to move data between them without bouncing through main ram"

Historical Context and the Evolution of Memory Expansion

Some users provide historical context, referencing older technologies like I-RAM and Texas Memory Systems' RamSan products, as well as Intel's Optane, to illustrate that the concept of expanding memory via PCIe is not entirely new, though CXL represents a more standardized and advanced approach.

gertrunde: "Such things have existed for quite a long time... For example: ... I-RAM ... And then there are the more exotic options, like the stuff that these folk used to make: Texas Memory Systems - iirc - Eve Online used the RamSan product line..."
tanelpoder: "Optane memory modules also present themselves as separate (memory only) NUMA nodes. They’ve given me a chance to play with Linux tiered memory, without having to emulate the hardware for a VM"

The "AI" Marketing Hype

One user expresses skepticism about the "AI" marketing being applied to this CXL product, considering it a testament to the current trends in the industry and questioning its relevance.

eqvinox: "The 'AI' marketing on this is positively silly (and a good reflection of how weird everything has gotten in this industry.)"