Intel’s Cascade Lake With DL Boost Goes Head to Head with Nvidia’s Titan RTX in AI Tests

Intel’s Cascade Lake With DL Boost Goes Head to Head with Nvidia’s Titan RTX in AI Tests

For the past few years, Intel has talked up its Cascade Lake servers with DL Boost (also known as VNNI, Vector Neural Net Instructions). These new capabilities are a subset of AVX-512 and are intended to specifically accelerate CPU performance in AI applications. Historically, many AI applications have favored GPUs over CPUs. The architecture of GPUs — massively parallel processors with low single-thread performance — has been a much better fit for graphics processors rather than CPUs. CPUs offer far more execution resources per thread, but even today’s multi-core CPUs are dwarfed by the parallelism available in a high-end GPU core.

Anandtech has compared the performance of Cascade Lake, the Epyc 7601 (soon to be surpassed by AMD’s 7nm Rome CPUs, but still AMD’s leading server core today), and an RTX Titan. The article, by the excellent Johan De Gelas, discusses different types of neural nets beyond the CNNs (Convolutional Neural Networks) that are typically benchmarked, and how a key part of Intel’s strategy is to compete against Nvidia in workloads where GPUs are not as strong or cannot yet serve the emerging needs of the market due to constraints on memory capacity (GPUs still can’t match CPUs here), the use of ‘light’ AI models that don’t require long training times, or AI models that depend on non-neural network statistical models.

Growing data center revenue is a critical component of Intel’s overall push into AI and machine learning. Nvidia, meanwhile, is keen to protect a market that it currently competes in virtually alone. Intel’s AI strategy is broad and encompasses multiple products, from Movidius and Nervana to DL Boost on Xeon, to the upcoming Xe line of GPUs. Nvidia is seeking to show that GPUs can be used to handle AI calculations in a broader range of workloads. Intel is building new AI capabilities into existing products, fielding new hardware that it hopes will impact the market, and trying to build its first serious GPU to challenge the work AMD and Nvidia do across the consumer space.

What Anandtech’s benchmarks show, in aggregate, is that the gulf between Intel and Nvidia remains wide — even with DL Boost. This graph of a Recurrent Neural Network test used a “Long Short-Term Memory (LSTM) network as neural network. A type of RNN, LSTM selectively “remembers” patterns over a certain duration of time.” Anandtech also used three different configurations to test it — out-of-the-box Tensorflow with conda, an Intel-optimized Tensorflow with PyPi, and a version of Tensorflow optimized from-source using Bazel, using the very latest version of Tensorflow.

Image by Anandtech
Image by Anandtech
Image by Anandtech
Image by Anandtech

This pair of images captures relative scaling between the CPUs as well as the comparison against the RTX Titan. Out of the box performance was quite poor on AMD, though it improved with the optimized code. Intel’s performance shot up like a rocket when the source-optimized version was tested, but even the source-optimized version didn’t match Titan RTX performance very well. De Gelas notes: “Secondly, we were quite amazed that our Titan RTX was less than 3 times faster than our dual Xeon setup,” which tells you something about how these comparisons run within the larger article.

DL Boost isn’t enough to close the gap between Intel and Nvidia, but in fairness, it probably was never supposed to be. Intel’s goal here is to improve AI performance enough on Xeon to make running these workloads plausible on servers that will be mostly used for other things, or when building AI models that don’t fit within the constraints of a modern GPU. The company’s longer-term goal is to compete in the AI market with a range of equipment, not just Xeons. With Xe not quite ready yet, competing in the HPC space right now means competing with Xeon.

For those of you wondering about AMD, AMD isn’t really talking about running AI workloads on Epyc CPUs, but has focused on its RocM initiative for running CUDA code on OpenCL. AMD does not talk about this side of its business very much, but Nvidia dominates the market for AI and HPC GPU applications. Both AMD and Intel want a piece of the space. Right now, both appear to be fighting uphill to claim one.

Continue reading

Intel’s Raja Koduri to Present at Samsung Foundry’s Upcoming Conference
Intel’s Raja Koduri to Present at Samsung Foundry’s Upcoming Conference

Intel's Raja Koduri will speak at a Samsung foundry event this week — and that's not something that would happen if Intel didn't have something to say.

Ryzen 9 5950X and 5900X Review: AMD Unleashes Zen 3 Against Intel’s Last Performance Bastions
Ryzen 9 5950X and 5900X Review: AMD Unleashes Zen 3 Against Intel’s Last Performance Bastions

AMD continues its onslaught on what was once Intel's undisputed turf.

Leaked Benchmarks Paint Conflicting Picture of Intel’s Rocket Lake
Leaked Benchmarks Paint Conflicting Picture of Intel’s Rocket Lake

Rumors about Rocket Lake have pointed in two opposite directions recently, but the more competitive figures are more likely to be true.

Intel’s Iris Xe Max Discrete GPU Is Slower Than the Integrated Version
Intel’s Iris Xe Max Discrete GPU Is Slower Than the Integrated Version

Intel's Iris Xe Max has debuted, but the discrete GPUs performance is a bit odd, as new benchmarks show.