Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

Ampere only launched six months ago, but Nvidia is upgrading the top-end version of its GPU to offer even more VRAM and considerably more bandwidth. The A100 (80GB) keeps most of the A100 (40GB)’s specifications: 1.41GHz boost clock, 5120-bit memory bus, 19.5 TFLOPs of single-precision, NVLink 3 support, and its 400W TDP are all unchanged from the previous iteration of the GPU. Both chips also feature 6,192 GPU cores.

What’s different is the maximum amount of VRAM (80GB, up from 40GB) and the total memory bandwidth (3.2Gbps HBMe, rather than 2.4Gbps HBMe). Bandwidth across the entire HBM2 array is 2TB/s, up from 1.6TB/s. This is a strong upgrade — it wouldn’t have been unusual for Nvidia to reduce the memory bandwidth of the array in order to double the capacity. Instead, the company boosted the total bandwidth by 1.25x.

Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

The A100 features six stacks of HBM2, as you can see in the image above, but Nvidia disables one of the stacks to improve yield. The remaining five stacks each have a 1024-bit memory bus, which is where the 5120-bit bus figure comes from. Nvidia replaced the HBM2 on the 40GB A100 with HBM2E, which allowed it to substantially upgrade the base specs.

The 80GB flavor should benefit workloads that are both capacity-limited and memory bandwidth bound. Like the 40GB variant, the A100 80GB can support up to 7 hardware instances with up to 10GB of VRAM dedicated to each.

Nvidia is selling these GPUs in mezzanine cards expected to be deployed in either an HGX or a DGX configuration. Customers who want an individual A100 GPU in a PCIe card are still limited to the 40GB variant, though this could change in the future.

The price tag on a server full of 80GB A100 cards is going to be firmly in “if you have to ask, you can’t afford it” territory. But there’s a reason companies on the cutting edge of AI development might pay so much. GPU model complexity is limited by onboard memory. If you have to touch main system memory, overall performance will crater — CPUs may have the kind of DRAM capacities that AI researchers would love for their models, but they can’t provide the necessary bandwidth (and CPUs aren’t great for modeling neural networks in any case). Expanding the total pool of onboard VRAM may allow developers to increase the absolute complexity of the model they’re training or to tackle problems that couldn’t previously fit into a 40GB VRAM pool.

Continue reading

Nvidia Will Mimic AMD’s Smart Access Memory on Ampere: Report
Nvidia Will Mimic AMD’s Smart Access Memory on Ampere: Report

AMD's Smart Access Memory hasn't even shipped yet, but Nvidia claims it can duplicate the feature.

AMD Will Bring Smart Access Memory Support to Intel, Nvidia Hardware
AMD Will Bring Smart Access Memory Support to Intel, Nvidia Hardware

AMD is reportedly working with Nvidia and Intel to bring hardware support for Smart Access Memory to other GPU and CPU platforms.

Tesla Ordered to Recall 150K+ Vehicles to Repair Memory Failures
Tesla Ordered to Recall 150K+ Vehicles to Repair Memory Failures

Tesla has been asked — or "asked" — to recall some 159,000 vehicles to repair a NAND memory issue that will eventually cause failures on every affected vehicle.

In Leaked Memo, LG Proposes Withdrawing From Smartphone Market
In Leaked Memo, LG Proposes Withdrawing From Smartphone Market

A memo circulated at LG declares some hard choices are necessary after $4.5 billion in losses over the past five years. The memo lists several possible courses of action, including the end of LG's mobile business.