Rumor: AMD Working on ‘Milan-X’ With 3D Die Stacking, Onboard HBM

Rumor: AMD Working on ‘Milan-X’ With 3D Die Stacking, Onboard HBM

Back at its Financial Analyst Day in 2020, AMD showed a diagram emphasizing its server CPU design chops and that chiplets weren’t the last step in the evolution of its CPUs. AMD opted to draw a line from its first deployment of HBM in 2015 through to the launch of chiplets, including a future CPU delivering X3D packaging with a combination of 2.5D and 3D technologies.

Rumor: AMD Working on ‘Milan-X’ With 3D Die Stacking, Onboard HBM

AMD never announced a specific product that would bring X3D to market, but a new rumor suggests the company is working on a product codenamed “Milan-X.” Milan-X would be based on AMD’s most recent Epyc processor architecture, but it would deploy far more memory bandwidth than we’ve seen in an AMD server before.

Milan-X aka Milan-X(3D). Genesis IO-die with stacked chiplets

I love lasagna 😋 https://t.co/O2FrGxyd8P

— ExecutableFix (@ExecuFix) May 25, 2021

AMD’s next-gen I/O die is supposedly called Genesis I/O, and the entire combined 2.5D/3D stack sits on top of a large interposer. AMD’s official diagram shows a 4-high stack of HBM per CPU cluster, with one HBM stack dedicated to each chip.

It is possible that AMD’s diagram above is only intended to show the general concept of what the company intends to build, not accurately convey the final design of the product. If the diagram is accurate, it suggests Milan-X will either feature more cores per chiplet (16 would be needed to hit 64 cores in four chiplets) or that Milan-X will top out at 32 cores. If the diagram is accurate, AMD’s interposer die must be underneath the cluster of chiplets.

This would definitely qualify as 3D chip stacking, but it also raises questions about how much power the I/O die will draw. It seems likely that AMD would have finally shrunk down to 7nm for I/O, just to limit the overall power consumption.

3D die stacking has always been difficult, outside of low-power environments, due to the problem of moving heat from the bottom to the top of the stack without cooking some part of the chip stack in the process. The Holy Grail of chip stacking is to put multiple high-power chiplets on top of each other as opposed to laying them out side-by-side, but Intel and AMD have both decided to tackle something a bit easier first: putting a hot chip on top of a cool one.

Intel doesn’t use the same X3D technology that AMD is rumored to be shipping for Milan-X, but its Foveros 3D interconnect allowed the company’s low-power Lakefield processor to feature one big-core Ice Lake CPU stacked on top of four low-power “Tremont” CPU cores. With Milan-X, AMD would be tackling something considerably more complex — again, assuming both that this rumor is true and that the I/O die is underneath the chip cluster.

Milan-X is said to be a data-center-only chip and it isn’t clear what kind of cooling solution would be required to deal with the CPU’s unique structure. Presumably, AMD will want to stick to forced air, but liquid and immersion cooling are also possible.

The amount of bandwidth Milan-X would offer in this configuration is unparalleled. Our recent TRACBench debut illustrated how much additional memory bandwidth could boost the performance of the eight-channel 3995WX compared with the quad-channel 3990X, even when the latter is running at a higher clock speed. In that comparison, a Threadripper 3995WX has up to 204.8GB/s worth of memory bandwidth to split across 64 cores.

If each Milan-X chiplet is still eight cores and the chip uses mainstream, commercially available HBM2E, we’d be looking at somewhere between 300-500GB/s worth of memory bandwidth per chiplet. Total available memory bandwidth across the entire chip should break 1TB/s and could reach 2TB/s. Whatever other constraints might bind Milan-X at that point, bandwidth would not be among them. The chip also presumably supports off-package memory, however. Even if we assume a near-term breakthrough allowing for 32GB per HBM2E stack, four stacks would only be 128GB of RAM and eight stacks would provide just 256GB. AMD’s current servers support 4TB of RAM per socket, so there’s no chance of replacing that kind of capacity with an equivalent amount of on-package HBM2E.

Milan-X looks like the kind of chip AMD could bring to bear against Sapphire Rapids. That CPU is expected to feature somewhere between 56 and 80 cores (reports have varied), and it also integrates HBM2 on-package. Sapphire Rapids is currently expected in late 2021 or early 2022. No launch date for Milan-X has been reported.