The packaging of the major components of a computer, relatively unchanged for decades, is undergoing a revolution. Having reached the limits of cooling and bandwidth between memory and CPU, for example, the industry is looking to new schemes to increase performance and use less power.
Leading the chase for the last two years has been the Hybrid Memory Cube idea. This is a Micron concept that has been taken up by a sizable consortium of industry leaders. The concept is based on replacing the traditional DRAM bus with a set of high-speed serial connections, while at the same time bringing memory and compute chips much closer physically to allow removing the power transistors that drove the DRAM bus. The resulting module reduces memory power by between 70% and 90% and takes performance into the 160 GB/s range today, both of which are spectacular improvements. Future configurations aim for more than doubling that and for eventually reaching a terabyte/second using multiple memory modules.
The enabling technology is the through-silicon via (TSV). By stacking memory die on a logic module and using TSVs to connect the top layers to the logic, a very small footprint can be achieved, with a large number of parallel links in use. Today’s products typically have four stacked die, so capacity is limited to 16 GB per module.
The logic layer can be a CPU, a GPU, an FPGA or just control logic and it seems that all of these options are beginning to appear. Let’s look at some of the applications. The CPU option seems obvious. Building DRAM on top of a CPU and delivering the result as a slightly thicker hybrid chip has interesting applications in smart phones and tablets as a way to save precious space, but more importantly scarce power. With solutions at 16GB in capacity, this is already a viable proposition, while higher density packages will cover the whole market spectrum.
In servers, the higher bandwidth is the attraction. It isn’t clear yet if the market will go down the CPU/memory-stack pathway or instead opt for tight side-by-side packaging with multiple memory chips, which could get bandwidth into the 500+GB/s range and increase HMC capacity. Intel’s Knights Landing phi chip aims to use a stacked memory structure, as one example.
Image courtesy: Intel
The two GPU manufacturers, AMD and NVidia, have opted for a different modular approach (High-Bandwidth Memory or HBM) using multi-channel parallel busses. Much wider than RAM busses, HMB delivers higher bandwidth. Here again, DRAM die stacking and close coupling to the GPU address performance and power issues. Applying the module approach to GPUs raises the same packaging question as with servers. The solution is likely to depend on whether the GPU product is aimed at the consumer or the artificial intelligence (AI) market.
The FPGA story, as accelerators in server systems, is still itself evolving, though companies like Altera and Xilinx have prototyping boards available for HMC.
This modular approach suggests major channel implications. The balance of power swings strongly to memory chip makers such as Micron and away from DIMM assemblers without foundries. This won’t be an overnight process, though, since a system structure and chip ecosystem has yet to evolve to use the modular solutions properly. Likely, the closed nature of smartphones will make that the easier market to penetrate.
The server market moving to modular approaches may well be complicated by the rise of a fabric-centric architecture within the server core. Approaches such as Gen-Z make the serial memory links the focus of an RDMA-based fabric that joins CPUs, GPUs, FPGAs and external communications together so that direct use of a common memory among all the server elements is possible as well as cross-cluster sharing of memory and interfaces. Future plans at AMD and NVidia address this tighter coupling of the memory to the GPU so that the problem in current architectures of transferring large amounts of data from CPU memory to GPU memory becomes a thing of the past.
Here too, the balance moves to chip makers at the cost of not only DIMM producers, but makers of add-in cards. Server motherboards will likely have ZIF sockets for SOC solutions for adapting drives or LANs and more sockets for memory and compute elements.
Not all is peace and joy in the vendor camps though. Intel is talking a different pathway than the rest of the industry. Instead of HMC for servers, there is discussion of a High-Bandwidth, Low-Latency (HBLL) DIMM solution that fits the performance gap between Optane NVDIMM and L3 cache better.
As a final complication, while these new memories are very fast, capacity is somewhat limited and is currently in the 16GB range or less. While this is similar to DIMMs, the architectures preclude lots of modules today. With a desire for terabyte sized memory, this is a problem that has no elegant solution yet. It may well be that an Intel HBLL approach with Optane in NVDIMMs may be a solution.
Despite Intel’s apparent apostasy (nothing is officially announced as product, yet), it’s clear that systems will become faster and more modular. To compound that modularity, NVMe over Ethernet is getting a lot of attention in the market as a way to share primary storage, so decoupling drives from servers, though they may still share packaging. All of this should settle into a roadmap for the whole industry in the first half of 2018 and lead to a truly major jump in system performance which should give an impetus to the whole market for systems and storage.