Leaks Reveal Nvidia 40-Series With Massive L2 Cache, Almost Double the CUDA Cores
A consortium of Twitter users have been poring over the recently leaked data from the Nvidia hack by the rogue group Lapsus$ and posting their findings online. Thus far, the leaks confirm previous rumors that Nvidia’s next-gen offering will indeed raise the bar. Not only will Nvidia’s upcoming Ada Lovelace GPU field a much larger L2 cache, the flagship chip will also reportedly offer almost double the number of CUDA cores found in the current chip, the GA102.
The largest change for Nvidia is a staggering 16x increase in total L2 cache on the AD102 compared to existing Ampere GPUs, from 6MB to 96MB, according to a summary via Tom’s Hardware. The AD102 chip will supposedly come with 16MB of cache per 64-bit memory controller, and with the expected 384-bit memory controller that equates to 96MB of cache total. The current GA102 Ampere chip has just 512KB of cache per 32-bit memory controller, so it’s a substantial increase, and one seemingly designed to rival AMD’s Infinity Cache solution from its RDNA2 RX 6800 GPUs. Interestingly using more cache as opposed to more memory controllers is one method to restrain power consumption, which is ironic as the AD102 die has previously been rumored to consume up to 85oW of power.
This could — could — indicate that Nvidia’s L2 cache consumes significantly more power than AMD’s L3. This would not necessarily be surprising; L1 consumes more power per KB than L2, and L2 consumes more than L3. Alternately, it could indicate that Nvidia is targeting aggressive clocks or that the new GPU targets very high power consumption to deliver maximum performance.
Such an increase in memory bandwidth is necessary due to the associated increase in CUDA cores, according to Twitter user ftiwvoe via Videocardz. It’s also being reported that AD102 will sport 18,432 cores, a 71 percent boost from GA102’s 10,752 on the upcoming RTX 3090 Ti. Ampere’s current 936GB/s memory bandwidth would simply be insufficient to keep that many cores fed, so adding a lot of extra cache is likely a better solution that adding more power-hungry memory controllers. All the “Lovelace” dies will receive a lot more cache too, with the smaller AD103 and AD104 chips packing 64MB and the AD106 with 48MB. The baby of the bunch, the AD107, will receive just 32MB, which is still 6x the amount in the current GA102 flagship.
As Tom’s Hardware notes, this seems like a very clear case of Nvidia cribbing from AMD’s approach with its RDNA2 GPUs, as it’s choosing to just add more cache instead of a wider memory bus. The rumors indicate Nvidia has no indication in changing the width of any configurations for next-gen, as opposed to going all the way with a 512-bit or even 1024-bit memory bus. There may be good historical reason for this. In the past, both AMD and Nvidia have occasionally fielded GPUs with very wide memory buses, but such cards tend to offer relatively low efficiency. It may have made more sense to use larger caches instead.
As it stands, the RX 6800 GPUs still have even more cache than the rumored RTX 40-series GPU with 128MB of Infinity Cache for both GPUs in the product stack. However, it’s also possible AMD might be keen on upping that for its RDNA3 GPUs, which are rumored to be coming in the second half of 2022 alongside Nvidia’s new cards in September.
Continue reading
How L1 and L2 CPU Caches Work, and Why They’re an Essential Part of Modern Chips
Ever been curious how L1 and L2 cache work? We're glad you asked. Here, we deep dive into the structure and nature of one of computing's most fundamental designs and innovations.
L2 vs. L3 cache: What’s the Difference?
What's the difference between L3 and other types of cache, and how does it impact system performance?
AMD Demos 3D Stacked Ryzen 9 5900X: 192MB of L3 Cache at 2TB/s
AMD had a surprise at Computex: CPUs with a lot of L3 cache, and a claimed generational-equivalent performance uplift.
AMD’s Been Planning Its V-Cache Ryzen Chips for Quite Some Time
It may appear AMD's V-Cache was an example of how the company was responding to Apple's M1, but that's not the case, as AMD's original design illustrates.