Nvidia Crushes Self to Take AI Benchmark Crown

Nvidia Crushes Self to Take AI Benchmark Crown

One of the tricky parts of keeping up with the innovation in AI hardware is that each vendor showcases applications and benchmarks that bring out the best in its own products. Since the field is relatively new, there haven’t been any good, broad, benchmarks to use for comparison. ImageNet is one perennial favorite, but with many new applications and network architectures being deployed, simple object recognition in 2D images doesn’t tell us that much about which hardware is fastest, or best for other workloads.

Now, a team of industry heavyweights, including Google, Intel, Baidu, and Nvidia has stepped up to meet the need with an early version (currently v0.5) of MLPerf, a benchmarking suite for machine learning that includes training a variety of networks. Nvidia announced that it has topped the initial results, but digging into the details shows it was pretty much the only game in town. If nothing else, it shows how dominant the GPU maker has been in the AI market.

MLPerf v0.5: Covering the Inferencing Waterfront

MLPerf currently consists of tests that time network training in seven application areas, starting with the classic standby of training ResNet-50 on ImageNet. It adds lightweight and heavyweight Object Detection (COCO), Recurrent and non-Recurrent Translation (WMT E-G), Recommendation (MovieLens-20M), and Reinforcement Learning (Mini Go). The only platform with results for all seven is the reference submission run on a Pascal P100. Inferencing benchmarks are planned for future versions.

Most of Nvidia’s results were run on one or more DGX-1 or DGX-2 supercomputers, and Google’s were run on its v2 and v3 TPU processors. Intel submitted some ImageNet times for its SKX 8180, but none of them were very competitive. However, systems using its $10K, 28-core, SKX 8180 was the sole competitive submission in the reinforcement learning category. That category is likely to be short-lived once there’s a non-CPU-bound version of that benchmark available.

With enough high-end GPUs neural network training is a lot less painful. These results are anywhere from 2x to 10x faster than the fastest single-node results.
With enough high-end GPUs neural network training is a lot less painful. These results are anywhere from 2x to 10x faster than the fastest single-node results.
Nvidia Crushes Self to Take AI Benchmark Crown

What This Really Shows Is That Nvidia Hardware Dominates AI

As a practical matter, most AI training is done on Nvidia hardware. Not just because of the price-performance of its GPUs, but also because of the prevalence of CUDA-based tools. So while Google submitted some benchmarks for its TPUs, its chips were originally built as an inferencing tool, and only in the latest generation have started to be used for training tasks. Similarly, so far AMD is nowhere to be found in the benchmark results, although AMD is one of the listed supporters of the MLPerf effort, so that will presumably change. In the meantime, Nvidia is pretty much competing with itself.

Continue reading

Benchmark Results Show Apple M1 Beating Every Intel-Powered MacBook Pro
Benchmark Results Show Apple M1 Beating Every Intel-Powered MacBook Pro

Apple's new M1 SoC can beat every single Intel system it sells, at least in one early benchmark result. We dig into the numbers and the likely competitive situation.

Leaked Benchmarks Paint Conflicting Picture of Intel’s Rocket Lake
Leaked Benchmarks Paint Conflicting Picture of Intel’s Rocket Lake

Rumors about Rocket Lake have pointed in two opposite directions recently, but the more competitive figures are more likely to be true.

Cyberpunk 2077 Benchmarks Show Even the Fastest GPU in the World Can’t Play at 4K
Cyberpunk 2077 Benchmarks Show Even the Fastest GPU in the World Can’t Play at 4K

It was probably impossible for Cyberpunk 2077 to live up to the hype after eight years in development, but the performance issues aren't helping.

Apple’s M1 Crushes Windows on ARM in 64-bit Benchmarks
Apple’s M1 Crushes Windows on ARM in 64-bit Benchmarks

Now that Windows on ARM can emulate 64-bit x86 apps, how do these systems compare against Apple's M1? Not well, it turns out.