There’s a new set of leaks around AMD’s next-generation server CPU, codenamed Milan-X. Milan-X uses the same microarchitecture as AMD’s current Milan, but with one significant difference: Up to 768MB of L3, divided between 8 chiplets and 64 cores.
AMD’s plans to staple an extra 64MB of L3 cache per chiplet via its new V-NAND structures have been well-discussed, but these leaks — if accurate — give us some idea of what kind of trade-offs the company is contemplating between TDP, clock speed, and cache size.
There are four parts in total — a 32-core and a 24-core round things out — but the top-end and bottom-end are the most interesting.
At the high end, AMD is trading ~10 percent base clock frequency for an extra 512MB of L3. At top of the 16-core mark, the 7373X trades off ~13 percent frequency, but offers no less than 48MB of L3 cache per core (768MB / 16 cores). If Milan-X uses the same chiplet configurations as Milan, AMD is only lighting up two cores per chiplet for a CPU like this — but the company has does something similar before. AMD currently ships an eight-core CPU with 256MB of L3 in total, or 32MB of L3 per cache. AMD may be reserving V-Cache for its high-power products; most of AMD’s 16-core chips target a TDP below 240W.
Elsewhere, AMD has suggested that its V-Cache is worth ~15 percent performance, which may seem to imply the company is giving up most of its advantage by trading away base clock speed. This probably isn’t true, for several reasons. First, base clock reflects the minimal clock, not necessarily the sustained CPU clock. Second, server workloads don’t scale according to the same factors as desktop workloads in all cases.
Desktop workloads tend to be latency bound as opposed to throughput bound. Obviously, there are server applications that also run into latency bottlenecks, but AMD’s 64-core CPU requires quite a lot of memory bandwidth to feed it. The enormous L3 cache on these chips will offset memory bandwidth demands. If AMD builds a new 64-core Threadripper on one of these core plans — and I see no reason to think it won’t — we can expect the chip to offer better performance in particular. Tests of the 3990X against the 3995WX showed that the former is quite memory bandwidth limited. AMD might also save some power on fabric if it can keep data local to the CPU more often, though this could be workload-dependent.
Slapping a giant L3 cache on top of a chip doesn’t necessarily sound like much of an advance, but AMD’s ability to stack the die vertically and the work it has done to keep the cache responsive make this a very interesting chip. No one has brought a commercial high-end CPU product to market with a 3D chip stack like this, though Intel has its own extensive plans around 3D die stacking via products like Foveros and EMIB. With Milan, AMD had to make some power tradeoffs between interconnect and cores, so we’ll see how an extra 64MB of L3 cache per chiplet changes the power equation in the next few months.