Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

Ampere only launched six months ago, but Nvidia is upgrading the top-end version of its GPU to offer even more VRAM and considerably more bandwidth. The A100 (80GB) keeps most of the A100 (40GB)’s specifications: 1.41GHz boost clock, 5120-bit memory bus, 19.5 TFLOPs of single-precision, NVLink 3 support, and its 400W TDP are all unchanged from the previous iteration of the GPU. Both chips also feature 6,192 GPU cores.

What’s different is the maximum amount of VRAM (80GB, up from 40GB) and the total memory bandwidth (3.2Gbps HBMe, rather than 2.4Gbps HBMe). Bandwidth across the entire HBM2 array is 2TB/s, up from 1.6TB/s. This is a strong upgrade — it wouldn’t have been unusual for Nvidia to reduce the memory bandwidth of the array in order to double the capacity. Instead, the company boosted the total bandwidth by 1.25x.

Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

The A100 features six stacks of HBM2, as you can see in the image above, but Nvidia disables one of the stacks to improve yield. The remaining five stacks each have a 1024-bit memory bus, which is where the 5120-bit bus figure comes from. Nvidia replaced the HBM2 on the 40GB A100 with HBM2E, which allowed it to substantially upgrade the base specs.

The 80GB flavor should benefit workloads that are both capacity-limited and memory bandwidth bound. Like the 40GB variant, the A100 80GB can support up to 7 hardware instances with up to 10GB of VRAM dedicated to each.

Nvidia is selling these GPUs in mezzanine cards expected to be deployed in either an HGX or a DGX configuration. Customers who want an individual A100 GPU in a PCIe card are still limited to the 40GB variant, though this could change in the future.

The price tag on a server full of 80GB A100 cards is going to be firmly in “if you have to ask, you can’t afford it” territory. But there’s a reason companies on the cutting edge of AI development might pay so much. GPU model complexity is limited by onboard memory. If you have to touch main system memory, overall performance will crater — CPUs may have the kind of DRAM capacities that AI researchers would love for their models, but they can’t provide the necessary bandwidth (and CPUs aren’t great for modeling neural networks in any case). Expanding the total pool of onboard VRAM may allow developers to increase the absolute complexity of the model they’re training or to tackle problems that couldn’t previously fit into a 40GB VRAM pool.

Continue reading

AMD’s New Radeon RX 6000 Series Is Optimized to Battle Ampere
AMD’s New Radeon RX 6000 Series Is Optimized to Battle Ampere

AMD unveiled its RX 6000 series today. For the first time since it bought ATI in 2006, there will be some specific advantages to running AMD GPUs in AMD platforms.

We Now Know How Much Scalpers Warped PS5, Xbox, Zen 3, Ampere Markets
We Now Know How Much Scalpers Warped PS5, Xbox, Zen 3, Ampere Markets

We finally have some information on how scalpers have hurt the market for top-end PC components. While they've definitely had an impact, they're not the primary cause of shortages.

Nvidia Will Mimic AMD’s Smart Access Memory on Ampere: Report
Nvidia Will Mimic AMD’s Smart Access Memory on Ampere: Report

AMD's Smart Access Memory hasn't even shipped yet, but Nvidia claims it can duplicate the feature.

Nvidia’s RTX 3080 Ti Possibly Pushed Back Until May, Ampere in Short Supply
Nvidia’s RTX 3080 Ti Possibly Pushed Back Until May, Ampere in Short Supply

Nvidia may have delayed the RTX 3080 Ti back to May, and Ampere GPUs are in short supply. Samsung's production woes may not have ended.