AMD’s Ryzen 6000 APU products will reportedly feature RDNA2-derived GPUs with up to 12 compute units (CUs) per CPU. These new cores would offer a substantial improvement in performance per cycle and feature up to 768 cores. This rumor comes from @ExecutableFix, who has stated on other tweets that he expects AMD’s Rembrandt to be a 6nm Zen 3+ core built at TSMC. The laptop variant of the CPU would offer an x8 PCIe 4.0 connection for dGPU performance, which is equivalent to x16 PCIe 3.0.
Rembrandt is RDNA 2 based with a maximum of 12 CUs 🔥
— ExecutableFix (@ExecuFix) May 8, 2021
It’s not surprising to hear that AMD is finally retiring Vega. Vega’s GPU architecture had a very poor initial outing in 2017, redeemed itself a bit with the Radeon VII, and finally flowered as an integrated GPU, of all things. AMD got an impressive uplift out of Vega when it transitioned the core for APU use, but RDNA2 brings some real improvements of its own.
First, RDNA introduced a significant 1.25x improvement in GPU performance per clock compared with GCN, courtesy of a 4x higher peak instruction rate. Approximately 60 percent of the architecture’s gains over GCN were attributed to low-level performance and efficiency improvements. We don’t know if RDNA2 will clock better than Vega APUs did, but the IPC gains will be welcome.
The really interesting question about an integrated RDNA2 is whether AMD will deploy Infinity Cache in some form for an APU. ExecutableFix’s leak above doesn’t speak to this point. Using a large cache to improve integrated graphics performance is not a new strategy; some motherboards used to offer a single 256MB DDR3 RAM chip on board as a dedicated graphics cache to improve performance over the iGPU alone. Intel has also offered iGPUs with an integrated cache on multiple devices.
AMD has not launched enough RDNA2 GPUs for us to be certain what the relationship is between core count and cache. At present, the 6900 XT has 1MB of cache for every 40 GPU cores, while the 6700 XT has 1MB of cache for every 26 GPU cores. Core count to cache ratio, however, may not be as important as the resolution target. If 128MB is enough for 4K and 96MB is enough for 1440p, it’s possible that AMD needs a minimum size of Infinity Cache (32-64MB) to make the strategy effective.
The biggest reason to think AMD would bring Infinity Cache to APUs is the performance boost it could get from doing so. The biggest to think it wouldn’t is die size. The Radeon 5700 XT — which has exactly the same number of cores, texture mapping units, and ROPs as the 6700 XT — has a 251 mm2 die. The 6700 XT die is 335mm2. The 6700 XT is 1.78x larger than the 5700 XT, and the difference is the former’s 96MB of on-die cache. AMD could probably get away with using denser layouts on an APU iGPU that isn’t intended to maintain aggressive clock speeds, but there’d be a hit to die size no matter what. Even a 5nm transition wouldn’t matter that much. SRAM has not scaled as effectively as logic at smaller node densities.
This is why it matters whether AMD can scale down its cache size proportionally or if it has to maintain a certain minimum. If a 16MB or 32MB Infinity Cache can still provide benefits, AMD might still use one. If it needs a 64MB cache to offer enough of a 1080p performance boost to meaningfully reduce memory bandwidth pressure, this would seem less likely. It is possible that a smaller Infinity Cache with a lower hit rate could still provide a performance and power advantage in an iGPU context, because memory bandwidth in iGPUs is such a bottleneck that the cache doesn’t need to be as accurate (relative to a full, discrete GPU) to provide effective acceleration.
Our tests of the Infinity Cache revealed that 96MB was enough to support the 6700 XT’s small 192-bit memory bus as effectively or more effectively than the 256-bit bus the Radeon 5700 XT uses. It also revealed that the Radeon 6700 XT uses a fraction of the power of the Radeon 5700 XT when both chips are clocked at ~1.85GHz — 267W versus 365W. At least in discrete GPUs, there’s an opportunity for power savings by using a large L3 cache.
But — one other factor to consider — is that Rembrandt is also supposed to introduce support for DDR5, including launch support for DDR5-4800. This is why the minimum effective size (in MB) for an Infinity Cache is important. If AMD has to choose between slapping 64MB of on-die SRAM on an APU or using DDR5, which will offer an effective 1.5x increase in memory bandwidth out of the box, it’s probably going to choose DDR5. Expanding the core count up to 768 cores in mobile gives the chip a little more room to stretch its legs, while 2x DDR5-4800 provides up to 76.8GB/s of memory bandwidth. AMD might decide that’s enough legroom to supply Zen 3+ APUs, especially if there’s little benefit to a 16-32MB IC.
New Intel Rocket Lake Details: Backwards Compatible, Xe Graphics, Cypress Cove
Intel has released a bit more information about Rocket Lake and its 10nm CPU that's been back-ported to 14nm.
Apple’s M1 Continues to Impress in Cinebench R23, Affinity Photo
New Cinebench R23 benchmarks paint AMD in a more competitive light against the M1, but Apple's SoC still acquits itself impressively. The Affinity Photo benchmark, however, is a major M1 win.
Why Apple’s M1 Chip Threatens Intel and AMD
Intel's own history suggests it and AMD should take Apple's new M1 SoC very seriously.
Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications
Nvidia is already capitalizing on its ARM acquisition with a massively powerful new CPU-plus-GPU combination that it claims will speed up the training of large machine-learning models by a factor of 10.