Nvidia’s Jetson Xavier Stuffs Volta Performance Into Tiny Form Factor

Nvidia’s Jetson Xavier Stuffs Volta Performance Into Tiny Form Factor

This week, Nvidia unveiled its new Jetson Xavier platform, a new computation board with significantly higher performance than the previous models from Team Green. Up until now, the company has offered the Jetson TK1 (2014), TX1 (2015) and Jetson TX2 (2017) as edge compute devices for AI workloads. The K1 was built around Kepler, the X1 used Maxwell, X2 is based on Pascal, and the Xavier is, as one might expect, based on Volta.

The new board packs 512 GPU cores (TX1 and TX2 were both 256-core solutions) with an eight-core ARM CPU of unspecified vintage. Nvidia has not clarified if this is a further evolution and refinement of its Denver CPU core, or if the company is using a bog-standard ARM Cortex design. Nvidia mentions ARM8.2, which is interesting, because ARM’s own blog states that 8.2 included support for an “enhanced memory model, half-precision floating point data processing and introduces both RAS (reliability availability serviceability) support and statistical profiling extension (SPE).” Scalable vector extensions (SVE) are also now supported in that amended instruction set.

Nvidia’s Jetson Xavier Stuffs Volta Performance Into Tiny Form Factor

Other upgrades from TX2 to Jetson Xavier include double the RAM (8GB to 16GB) at more than double the bandwidth (59.7GB/s to 137GB/s) and a pair of new Nvidia-specific deep learning accelerators. The NVDLA is described as an inference-processing solution for various types of machine learning workloads at Nvidia’s NVDLA.org site (available here). The exact text states:

NVDLA hardware is comprised of the following components:

Convolution Core – optimized high-performance convolution engine.Single Data Processor – single-point lookup engine for activation functions.Planar Data Processor – planar averaging engine for pooling.Channel Data Processor – multi-channel averaging engine for advanced normalization functions.Dedicated Memory and Data Reshape Engines – memory-to-memory transformation acceleration for tensor reshape and copy operations.

The same report notes that the configurations are modular and intended to be adjusted depending on the needs of the customer, so it’s not clear exactly which solution Nvidia is shipping with Xavier (the company’s documentation goes through two examples, a “small” and “large” Nvidia NVDLA model).

Nvidia is specifying that their Xavier board can stretch to fit a variety of usage models at TDPs ranging from 10W to 30W, with claims that the platform can hit 10 TFLOPS of FP16 and 20 TOPS using INT8. FP32 performance is 5 TFLOPS. The board is a significant step forward for Nvidia’s overall AI and ML performance in this form factor and comes on the heels of announcements like the HGX-2 — a much larger, ‘big iron’ server configuration intended for labs with far more cash to drop and more power to burn. The HGX-2 can draw 10 kilowatts which, as Next Platform notes, is a bit of a gamechanger for this kind of workload and capability. At 30W, the Jetson Xavier is intended for much more modest uses and platforms, where it still brings far more performance to the table than its predecessor.

Continue reading

Nvidia’s new Jetson Xavier NX Adds Horsepower to AI at the Edge
Nvidia’s new Jetson Xavier NX Adds Horsepower to AI at the Edge

Nvidia continues to make major strides in delivering AI to devices of all sizes. Its new Xavier NX module squeezes large-device performance into a tiny, low-power, form factor at a decent price.