Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications

Nvidia thinks it’s time for traditional CPUs to step aside when it comes to tackling the largest machine learning tasks, especially training huge models that are now upwards of a trillion parameters. Conventional super-computers make use of specialized processors — often GPUs — to do much of the compute-intensive math during training, but GPUs typically can’t host nearly the amount of memory needed, let alone share it quickly in multi-GPU configurations. So the machine is often bottlenecked by the speed of getting data from CPU memory to the GPU and back.

With Nvidia’s new Grace deep-learning CPU, which the company unveiled today, supercomputer GPUs get both much faster access to CPU memory and much better aggregate bandwidth when multiple GPUs are paired with a single CPU. Nvidia says the device required 10,000 engineering years of work, but given their gift for hyperbole, we’re not sure what they’re including. And as you may have already guessed, the ARM-based CPU is named after Grace Hopper, an early computing pioneer.

Key to Grace’s performance gain is Nvidia’s NVLink interconnections between the CPU and multiple GPUs. Nvidia says that it can move 900 GB/second over NVLink, which is many times more bandwidth than is typically available between CPU and GPU. The CPU memory itself is also optimized, as Grace will use LPDDR5x RAM for the CPU. Grace will also feature a unified memory space and cache coherence between the CPU and GPUs, which should make GPGPU programming less of a headache. It will support Nvidia’s CUDA and CUDA-X libraries, along with its HPC software development kit.

Overall, Nvidia says that a Grace-powered system will be able to train large models (think a trillion parameters and up), and run similar-sized simulations as much as 10 times faster than a similar system using x86 CPUs. Nvidia was careful to stress, though, that it doesn’t see Grace displacing x86 processors in applications smaller than that.

As exciting as the potential for Grace is, it will be a while before anyone will get to work with one. Nvidia expects the chip to be available in 2023. The company also announced that the Swiss Supercomputing Center and the Los Alamos National Laboratory are planning to build massive new supercomputers using Grace CPUs. The machines will be built by HP Enterprise, and are expected to come online that same year. Both customers are excited that the new machines will allow them to analyze larger datasets than before, and improve the performance of their scientific computing software.