AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs

AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs

AMD may have revealed its upcoming 7nm Epyc CPUs at its New Horizons event, but it only touched on many of the architectural enhancements and improvements for the core. We know that the chip pairs a series of 7nm chiplets (each containing eight CPU cores), but fine details on cache organization or CCX design haven’t been revealed yet. A new data point provided courtesy of SiSoft Sandra suggests that AMD has doubled the amount of L3 cache per CPU core, at least on Epyc.

AMD May Have Doubled Per-Core L3 Cache on 7nm Epyc CPUs
Image by Overclock3D
Image by Overclock3D

Doubling the total amount of L3 cache per core is an expected move for AMD and should help improve Epyc performance overall. AMD’s existing CCX implementation allocates 8MB of L3 per CCX, with two CCX per die. Ping times between logical cores are roughly 26ns when pinging the same CPU core, 42ns when pinging within the same CCX, and 142ns when pinging a different CCX from within the same physical die. That’s not much better from the memory latency hit you take when you step out to main memory to retrieve data that way.

What this means, in aggregate, is that Epyc doesn’t actually have a 64MB L3 at all, in any meaningful sense. It has 8 L3 caches of 8MB each. This works just fine for applications that can fit into an 8MB cache slice, but it hampers Epyc on any application that doesn’t fit this access model. As this memory latency benchmark from Anandtech shows, Epyc’s memory latency in dual random reads is quite competitive below 8MB and significantly worse than Intel above that point.

Graph by Anandtech
Graph by Anandtech

Doubling the amount of L3 cache per die will obviously improve performance in applications that fit into a 16MB access pool but not an 8MB slice. I want to caution, however, against concluding that this is the only change AMD has made to Epyc’s overall organization. The decision to organize Epyc as a set of 7nm chiplets that connect to a common I/O die is going to impact core-to-core communication. It’s not clear exactly how things will change with AMD’s Rome silicon because the company hasn’t released this information yet, but there are a lot of knobs and dials AMD could have tweaked. In addition to the physical changes that we know 7nm Epyc incorporates, there are potential changes to caching strategy, Infinity Fabric improvements, CCX design alterations, and even shifts in how AMD manages power consumption in its caches that could potentially impact memory latency. Knowing that the company likely doubled up on L3 cache does tell us something about Rome, but it isn’t the whole story.

How this change could impact desktop Ryzen is unclear. AMD could opt to keep the same L3 cache size per die, or it might fuse off some L3 to recover bad chips or differentiate between Epyc and Ryzen parts. The company’s original Ryzen launched reused the same silicon across all product families to the maximal extent possible, but some of the company’s second-generation Ryzen 5 CPUs have smaller L3 caches (8MB on the Ryzen 5 2500X, versus 16MB on the Ryzen 5 1500X).

Continue reading

Intel Launches AMD Radeon-Powered CPUs
Intel Launches AMD Radeon-Powered CPUs

Intel's new Radeon+Kaby Lake hybrid CPUs are headed for store shelves. Here's how the SKUs break down and what you need to know.

RISC-V Tiptoes Towards Mainstream With SiFive Dev Board, High-Performance CPU
RISC-V Tiptoes Towards Mainstream With SiFive Dev Board, High-Performance CPU

RISC V continues to make inroads across the market, this time with a cheaper and more fully-featured test motherboard.

VIA Technologies, Zhaoxin Strengthen x86 CPU Development Ties
VIA Technologies, Zhaoxin Strengthen x86 CPU Development Ties

VIA and Zhaoxin are deepening their strategic partnership with additional IP transfers, intended to accelerate long-term product development.

What Does It Mean for the PC Market If Apple Makes the Fastest CPU?
What Does It Mean for the PC Market If Apple Makes the Fastest CPU?

Apple's M1 SoC could have a profound impact on the PC market. After 25 years, x86 may no longer be the highest-performing CPU architecture you can practically buy.