Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

Ampere only launched six months ago, but Nvidia is upgrading the top-end version of its GPU to offer even more VRAM and considerably more bandwidth. The A100 (80GB) keeps most of the A100 (40GB)’s specifications: 1.41GHz boost clock, 5120-bit memory bus, 19.5 TFLOPs of single-precision, NVLink 3 support, and its 400W TDP are all unchanged from the previous iteration of the GPU. Both chips also feature 6,192 GPU cores.

What’s different is the maximum amount of VRAM (80GB, up from 40GB) and the total memory bandwidth (3.2Gbps HBMe, rather than 2.4Gbps HBMe). Bandwidth across the entire HBM2 array is 2TB/s, up from 1.6TB/s. This is a strong upgrade — it wouldn’t have been unusual for Nvidia to reduce the memory bandwidth of the array in order to double the capacity. Instead, the company boosted the total bandwidth by 1.25x.

Nvidia Unveils Ampere A100 80GB GPU With 2TB/s of Memory Bandwidth

The A100 features six stacks of HBM2, as you can see in the image above, but Nvidia disables one of the stacks to improve yield. The remaining five stacks each have a 1024-bit memory bus, which is where the 5120-bit bus figure comes from. Nvidia replaced the HBM2 on the 40GB A100 with HBM2E, which allowed it to substantially upgrade the base specs.

The 80GB flavor should benefit workloads that are both capacity-limited and memory bandwidth bound. Like the 40GB variant, the A100 80GB can support up to 7 hardware instances with up to 10GB of VRAM dedicated to each.

Nvidia is selling these GPUs in mezzanine cards expected to be deployed in either an HGX or a DGX configuration. Customers who want an individual A100 GPU in a PCIe card are still limited to the 40GB variant, though this could change in the future.

The price tag on a server full of 80GB A100 cards is going to be firmly in “if you have to ask, you can’t afford it” territory. But there’s a reason companies on the cutting edge of AI development might pay so much. GPU model complexity is limited by onboard memory. If you have to touch main system memory, overall performance will crater — CPUs may have the kind of DRAM capacities that AI researchers would love for their models, but they can’t provide the necessary bandwidth (and CPUs aren’t great for modeling neural networks in any case). Expanding the total pool of onboard VRAM may allow developers to increase the absolute complexity of the model they’re training or to tackle problems that couldn’t previously fit into a 40GB VRAM pool.

Continue reading

Protect Your Online Privacy With the 5 Best VPNs
Protect Your Online Privacy With the 5 Best VPNs

Investing in a VPN is a smart choice right now, but the options are vast. To help narrow things down a bit, we've rounded up five of our very favorite consumer services.

RISC-V Tiptoes Towards Mainstream With SiFive Dev Board, High-Performance CPU
RISC-V Tiptoes Towards Mainstream With SiFive Dev Board, High-Performance CPU

RISC V continues to make inroads across the market, this time with a cheaper and more fully-featured test motherboard.

The PlayStation 5 Will Only Be Available Online for Launch Day
The PlayStation 5 Will Only Be Available Online for Launch Day

The PlayStation 5 isn't going to be available in stores on launch day, and if you want to pick up an M.2 SSD to expand its storage, you'll have some time to figure out that purchase.

ARMing for War: New Cortex-A78C Will Challenge x86 in the Laptop Market
ARMing for War: New Cortex-A78C Will Challenge x86 in the Laptop Market

ARM took another step towards challenging x86 in its own right with the debut of the Cortex-A78C this week. The new chip packs up to eight "big" CPU cores and up to an 8MB L3 cache.