Intel Positions Xeon as Machine Learning Competitor in Inference Workloads

Most of the time, when we talk about machine learning, artificial intelligence, or similar workloads, we’re discussing either Nvidia GPUs or custom silicon from companies like Google. Companies like AMD and Intel have been smaller players, though Intel still has its Knights Mill and other Xeon Phi products, as well as a low-power play courtesy of Movidius. But as companies move into these markets — and artificial intelligence, machine learning, and deep learning are all new markets, even if they represent areas where some researchers have been toiling for decades — they’ve also experimenting with hardware optimizations and best practices that can dramatically improve performance in products you might not expect to be competitive. Case in point: Intel’s Xeon processors.

First, the Stanford team behind Dawnbench, described as “the first deep learning benchmark and competition that measures end-to-end performance: the time/cost required to achieve a state-of-the-art accuracy level for common deep learning tasks, as well as the latency/cost of inference at this state-of-the-art accuracy level,” has released some impressive inference results. If you recall, there’s a difference between the time it takes to train a deep learning neural network and the time it takes for the network to apply what it has learned to new data.

Nvidia has published a blog post that goes into some detail on how inference testing works, and the differences between teaching a machine learning model and running inference tests. In inference tests, Intel’s Xeon hit the highest overall results, with image latency of just 9.96ms and the ability to process 10,000 images for just $0.02. In the larger suite of Dawn benchmarks, the machine learning and training benchmarks are generally dominated by Nvidia or Google. Volta is, as one might expect, a prominent presence.

In a separate test, Intel is claiming that it can outperform the Volta V100 dramatically in certain tests. The previous results are from an independently-developed test; these results are from Intel directly. The company writes:

The Intel Xeon Scalable processor with optimized software has demonstrated enormous performance gains for deep learning compared to the previous generations without optimized software. For example, compared to the Intel Xeon v3 processor (formerly codename Haswell) the gains are up to 198x for inference and 127x for training.[1] This applies to various types of models including multi-layer perceptron (MLP), convolutional neural networks (CNNs), and the various types of recurrent neural networks (RNNs). The performance gap between GPUs and CPUs for deep learning training and inference has narrowed, and for some workloads, CPUs now have an advantage over GPUs.
For machine translation which uses RNNs, the Intel Xeon Scalable processor outperforms NVidia* V100* GPU by 4x on the AWS Sockeye Neural Machine Translation (NMT) model with Apache* MXNet* when Intel® Math Kernel Library (Intel MKL) is used.

We’re going to be seeing a lot more news of this sort in the next few years. AMD, Nvidia, Intel, Google, and a dozen other companies large and small are throwing their hats into the AI/ML ring, and that’s going to mean a lot of slugging it out over bragging rights and performance claims.

My advice for the general reader (as opposed to a research specialist) is to treat these claims as data points to be considered, rather than absolute declarations of victory or defeat. It’s going to take some time to nail down exactly how to compare and evaluate this new field, and sometimes the comparisons are going to depend on the use case of the product in question. A chip that’s extremely power-efficient might be a better option than a chip that’s very fast but has high power consumption, when evaluating which AI solution is better for a battery-operated device over the long term, for example.

Intel’s overarching argument — it’s meta-argument, if you will — is that it’s CPUs and existing expertise have a role to play in AI and ML, even as it ramps up investment in companies like Movidius and kicks off new GPU projects. I think it’s a fair argument for the company to make, especially considering that Intel and AMD servers are going to be powering a lot of servers in the coming years, including servers dedicated to these workloads. But it’s early to award performance crowns, at least definitive ones.