Report: Nvidia’s Next-Gen GPU Could Pack 18,432 CUDA Cores, 64TFLOPS

Report: Nvidia’s Next-Gen GPU Could Pack 18,432 CUDA Cores, 64TFLOPS

Fresh leaks suggest that Nvidia’s next-generation GPU will be named after computing pioneer Ada Lovelace with an enormous jump in maximum GPU core counts relative to current parts. The leaks, written by @kopite7kimi, suggest the chip could pack 12 GPU processing clusters, 72 texture processing clusters, and a total of 144 streaming multiprocessors. Assume the company sticks with 128 GPU cores per streaming multiprocessor group, that brings Lovelace up to 18,432 cores.

So, nVidia's AD102 chip maybe is like:12 GPC72 TPC144 SM18'432 FP32 units~66 TFlops FP32 power (on 1.8 GHz) https://t.co/A8OnUktE1s

— 3DCenter.org (@3DCenter_org) December 28, 2020

3dcenter.de believes the GPU will clock at around 1.75GHz (the 1.8GHz prediction above got trimmed back a bit). This would imply clocks roughly comparable to current chips like the RTX 3090, though that card can boost higher than Nvidia’s official clocks. Ada might or might not follow the same behavior.

The interesting thing about that prediction is that it doesn’t square with what’s been conventionally predicted for 5nm GPUs. According to TSMC, 5nm is only expected to introduce modest performance and power consumption improvements of ~15 percent and ~20 percent, respectively. The big winner on 5nm is supposed to be density, with up to a 45 percent gain over 7nm, though these improvements tend to depend on exactly what kind of chip you are trying to build in the first place. Larger, more power-hungry structures intended for high-speed operation tend to draw more power than a more modest implementation.

Nvidia’s huge core count expansion would make sense given predicted density improvements, but power consumption is a major unknown. The RTX 3090 significantly outperforms Turing, but Nvidia had to expand the GPU’s power consumption to do it, up to 350W from 280W. It’s not clear how much additional headroom exists to keep pushing GPU power consumption. I won’t claim to know exactly where the cutoff would be, but it’s difficult to imagine Nvidia shipping 450W-500W cards for consumer systems. At some point, Nvidia is going to have to limit its own growth. Intel and AMD will allow their respective CPUs to draw over 200W of power in short boosts, but they don’t sit at those TDPs long-term by default.

Remarks on the increase in L2 cache don’t mean much, at this point. If you scaled up Ampere from RTX 3090 to the 18,432 cores contemplated by this design, you’d wind up with more total L2 on-die no matter what. It’s an unknown whether Nvidia will adopt any of the features we’ve seen AMD deploy on its own RDNA2 architecture, like a large, on-die central cache (AMD refers to this as its “Infinity” cache).

Lovelace is currently expected in 2022. It’s not known if Nvidia will launch a true Ampere refresh cycle in 2021, or if the company will instead opt to launch high-VRAM variants of cards. There are rumors of an RTX 3080 Ti (20GB) and an RTX 3060 Ti with 12GB of RAM — an RTX 3070 Ti with 16GB of RAM would fit neatly in the stack. Nvidia could potentially pair these VRAM jumps with higher clocks or slightly more GPU cores across the new hardware for any 2021 refresh cycle.

We haven’t heard anything yet about additional features Ada might introduce, or where Nvidia will choose to build the chip. Nvidia began building its Ampere cores at Samsung on that firm’s 8N node, but there have been rumors that poor yields with Samsung pushed Nvidia to swap back to TSMC for future product launches coming in 2021.

Feature image is Nvidia’s Ampere. No images or mock-ups of Lovelace have been released.