That Gigabyte trove of documents is proving to be the gift that keeps on giving. Today’s choice tidbits involve AMD’s long-term plan for graphics with Ryzen and a few early details on Genoa. Readers should keep in mind that leaks are definitionally unofficial and that documentation can change and be updated over time.
As for the low-level Genoa details, the purpose of the reference documentation is not to provide an exhaustive summary of the differences between Zen 3 and Zen 4. A certain amount of information has been gleaned about the core from feature-level documentation and some high-level spec sheets. Such information can be valid but contextually incomplete. Any given technical document will discuss the CPU’s improvements and capabilities relative to the specific topic at hand, which means collective information about features is widely distributed and takes more time to analyze.
We’ll talk about the GPU side of things first, then pivot to the CPU.
This document indicates all three types of Socket AM5 CPU will ship with integrated graphics, though the capability may not be offered on every SKU. AMD could be planning to take a page from Intel’s book and offer a series of chips without GPUs at slightly lower prices than the GPU-equipped variants. Of the three CPU types, only one — presumably the desktop CPU family — will offer a full 28 PCIe lanes. The other two variants are limited to just 20 lanes. The difference in Family number likely denotes desktop and laptop CPUs, with a lower-end desktop variant configured more like the laptop chip.
One important point is that AMD may have a chart much like this for Zen 3-based products already. Assume that AMD eventually ships a lower-end APU with four Zen 3 cores and Vega or RDNA2 graphics. If that chip were limited to 20 PCIe lanes, Zen 3’s theoretical product configuration would match what’s shown for Zen 4 at the family level. AMD’s mobile CPUs always carry integrated graphics and the 5600G and 5700G added Vega support to the six-core and eight-core markets.
We’re not claiming other sites have the reporting wrong — we generally expect AMD to add graphics to more desktop and laptop products in general — but this page of data does not specifically confirm that the company will do so with Zen 4. The 16MB – 32MB L3 cache sizes it suggests are the same size as existing chips in the 5700G and 5800X families, respectively.
V-Cache also is not mentioned here. This does not mean that AMD will drop V-Cache from Zen 4. It could mean that the company hasn’t finalized its V-Cache plans, that this document was written before those plans were made, or that the information wasn’t relevant to the topic discussed here.
Our own theory is that adding V-Cache to CPUs today might be a precursor to integrating a GPU core cluster at a later date. Stacking a large L3 cache on top of the chip and allowing the GPU to use it would undoubtedly boost performance by relieving memory bandwidth pressure. AMD has claimed it can continue scaling V-Cache past 64MB. There are some similarities to Intel’s Crystal Well from 2013, but AMD claims V-Cache can deliver 2TB/s of memory bandwidth. Intel’s Crystal Well was mounted on-die but off-package and provided just 100GB/s of bandwidth. A sufficiently large L3 shared by both CPU and GPU could allow the GPU to out-scale any previous AMD integrated solution, delivering some of the benefits of a wide interface like HBM at (presumably) a lower cost.
Low-Level Genoa Details
Chips and Cheese has also gone digging in the Genoa documentation to find some architectural improvements for Zen 4 relative to Zen 3. Genoa will support VNNI and AVX-512 instructions more generally, with an implementation similar to Ice Lake Server as far as total instructions supported. Chips and Cheese thinks it’s possible Zen 4 will offer either a single 1×512-bit FMA or a pair of 256-bit units that can be ganged to support 512-bit math. Two floating-point units would set Zen 4 up more directly to compete with Intel in AVX-512, while a single 512-bit unit would offer compatibility and increased performance in some scenarios without incurring the same power cost. AMD has previously implemented AVX2 support with 128-bit registers, so AVX-512 with 256-bit registers would not be unprecedented.
Here’s the overview:
There aren’t many changes here, though the expanded L2 and larger DTLB are both welcome. Combined with AVX-512 support and the extensive changes to Genoa’s support for storage-class memory, the server version of the chip will pack a number of enhancements over Zen 3. The fact that we don’t see evidence of more changes from this document could mean that Zen 4 is an iterative improvement on Zen 3, or it could mean that the data is in other documents that deal with other aspects of the chip. If AMD were to hypothetically launch a Zen 4 with a modest IPC boost, a few hundred MHz of additional clock, AVX-512 support, and a faster version of Infinity Fabric, it would hit all the categories that collectively justify calling a CPU a new architectural revision on what came before.
In other words: Don’t conclude that Genoa isn’t much different from Zen 3 on the basis of a few documents. While it may be true that AMD mostly focused on getting the 5nm transition right rather than implementing new features, there’s still plenty of time before Zen 4 ships to learn more about the core.
Chips and Cheese has more detail on Zen 4’s implementation of storage class memory, so check them out if you want to read more on the topic. It’s interesting that V-Cache is not mentioned in these reports, but the idea that AMD would remove the feature once implemented is also odd. AMD claims it can pick up 1.15x from more L3. If it removes that L3 in future designs, it either has to do so by designing a CPU that doesn’t benefit from it, or by designing a CPU that’s such a jump in clock and/or IPC, it doesn’t need the cache to hit AMD’s performance targets. Neither of these is impossible. But it’s unlikely that AMD would go to the trouble to build an L3 cache for future Zen 3 chips only to remove it for Zen 4.
Intel’s Desktop TDPs No Longer Useful to Predict CPU Power Consumption
Intel's higher-end desktop CPU TDPs no longer communicate anything useful about the CPUs power consumption under load.
VIA Technologies, Zhaoxin Strengthen x86 CPU Development Ties
VIA and Zhaoxin are deepening their strategic partnership with additional IP transfers, intended to accelerate long-term product development.
Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications
Nvidia is already capitalizing on its ARM acquisition with a massively powerful new CPU-plus-GPU combination that it claims will speed up the training of large machine-learning models by a factor of 10.
How L1 and L2 CPU Caches Work, and Why They’re an Essential Part of Modern Chips
Ever been curious how L1 and L2 cache work? We're glad you asked. Here, we deep dive into the structure and nature of one of computing's most fundamental designs and innovations.