AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs

AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs

AMD may have revealed its upcoming 7nm Epyc CPUs at its New Horizons event, but it only touched on many of the architectural enhancements and improvements for the core. We know that the chip pairs a series of 7nm chiplets (each containing eight CPU cores), but fine details on cache organization or CCX design haven’t been revealed yet. A new data point provided courtesy of SiSoft Sandra suggests that AMD has doubled the amount of L3 cache per CPU core, at least on Epyc.

AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs
Image by Overclock3D
Image by Overclock3D

Doubling the total amount of L3 cache per core is an expected move for AMD and should help improve Epyc performance overall. AMD’s existing CCX implementation allocates 8MB of L3 per CCX, with two CCX per die. Ping times between logical cores are roughly 26ns when pinging the same CPU core, 42ns when pinging within the same CCX, and 142ns when pinging a different CCX from within the same physical die. That’s not much better from the memory latency hit you take when you step out to main memory to retrieve data that way.

What this means, in aggregate, is that Epyc doesn’t actually have a 64MB L3 at all, in any meaningful sense. It has 8 L3 caches of 8MB each. This works just fine for applications that can fit into an 8MB cache slice, but it hampers Epyc on any application that doesn’t fit this access model. As this memory latency benchmark from Anandtech shows, Epyc’s memory latency in dual random reads is quite competitive below 8MB and significantly worse than Intel above that point.

Graph by Anandtech
Graph by Anandtech

Doubling the amount of L3 cache per die will obviously improve performance in applications that fit into a 16MB access pool but not an 8MB slice. I want to caution, however, against concluding that this is the only change AMD has made to Epyc’s overall organization. The decision to organize Epyc as a set of 7nm chiplets that connect to a common I/O die is going to impact core-to-core communication. It’s not clear exactly how things will change with AMD’s Rome silicon because the company hasn’t released this information yet, but there are a lot of knobs and dials AMD could have tweaked. In addition to the physical changes that we know 7nm Epyc incorporates, there are potential changes to caching strategy, Infinity Fabric improvements, CCX design alterations, and even shifts in how AMD manages power consumption in its caches that could potentially impact memory latency. Knowing that the company likely doubled up on L3 cache does tell us something about Rome, but it isn’t the whole story.

How this change could impact desktop Ryzen is unclear. AMD could opt to keep the same L3 cache size per die, or it might fuse off some L3 to recover bad chips or differentiate between Epyc and Ryzen parts. The company’s original Ryzen launched reused the same silicon across all product families to the maximal extent possible, but some of the company’s second-generation Ryzen 5 CPUs have smaller L3 caches (8MB on the Ryzen 5 2500X, versus 16MB on the Ryzen 5 1500X).

Continue reading

PC OEMs Are Selling Laptops With Optane Cache Drives and Claiming It’s Memory

Multiple OEMs are marketing systems as if their total memory loadout is equivalent to their actual DRAM + Optane cache. That's not how this works.

IBM Unveils 19TB SSD With Everspin MRAM Data Cache

MRAM is getting a major boost from IBM in a new, massive 19TB SSD.

How L1 and L2 CPU Caches Work, and Why They’re an Essential Part of Modern Chips

Ever been curious how L1 and L2 cache work? We're glad you asked. Here, we deep dive into the structure and nature of one of computing's most fundamental designs and innovations.

Intel Optane Memory H10: Cache, NAND Flash on Single M.2 Device

Intel has announced its new Optane Memory H10 — Optane storage and a QLC NAND drive on the same silicon.