Intel’s Cascade Lake With DL Boost Goes Head to Head with Nvidia’s Titan RTX in AI Tests

For the past few years, Intel has talked up its Cascade Lake servers with DL Boost (also known as VNNI, Vector Neural Net Instructions). These new capabilities are a subset of AVX-512 and are intended to specifically accelerate CPU performance in AI applications. Historically, many AI applications have favored GPUs over CPUs. The architecture of GPUs — massively parallel processors with low single-thread performance — has been a much better fit for graphics processors rather than CPUs. CPUs offer far more execution resources per thread, but even today’s multi-core CPUs are dwarfed by the parallelism available in a high-end GPU core.

Anandtech has compared the performance of Cascade Lake, the Epyc 7601 (soon to be surpassed by AMD’s 7nm Rome CPUs, but still AMD’s leading server core today), and an RTX Titan. The article, by the excellent Johan De Gelas, discusses different types of neural nets beyond the CNNs (Convolutional Neural Networks) that are typically benchmarked, and how a key part of Intel’s strategy is to compete against Nvidia in workloads where GPUs are not as strong or cannot yet serve the emerging needs of the market due to constraints on memory capacity (GPUs still can’t match CPUs here), the use of ‘light’ AI models that don’t require long training times, or AI models that depend on non-neural network statistical models.

Growing data center revenue is a critical component of Intel’s overall push into AI and machine learning. Nvidia, meanwhile, is keen to protect a market that it currently competes in virtually alone. Intel’s AI strategy is broad and encompasses multiple products, from Movidius and Nervana to DL Boost on Xeon, to the upcoming Xe line of GPUs. Nvidia is seeking to show that GPUs can be used to handle AI calculations in a broader range of workloads. Intel is building new AI capabilities into existing products, fielding new hardware that it hopes will impact the market, and trying to build its first serious GPU to challenge the work AMD and Nvidia do across the consumer space.

What Anandtech’s benchmarks show, in aggregate, is that the gulf between Intel and Nvidia remains wide — even with DL Boost. This graph of a Recurrent Neural Network test used a “Long Short-Term Memory (LSTM) network as neural network. A type of RNN, LSTM selectively “remembers” patterns over a certain duration of time.” Anandtech also used three different configurations to test it — out-of-the-box Tensorflow with conda, an Intel-optimized Tensorflow with PyPi, and a version of Tensorflow optimized from-source using Bazel, using the very latest version of Tensorflow.

This pair of images captures relative scaling between the CPUs as well as the comparison against the RTX Titan. Out of the box performance was quite poor on AMD, though it improved with the optimized code. Intel’s performance shot up like a rocket when the source-optimized version was tested, but even the source-optimized version didn’t match Titan RTX performance very well. De Gelas notes: “Secondly, we were quite amazed that our Titan RTX was less than 3 times faster than our dual Xeon setup,” which tells you something about how these comparisons run within the larger article.

DL Boost isn’t enough to close the gap between Intel and Nvidia, but in fairness, it probably was never supposed to be. Intel’s goal here is to improve AI performance enough on Xeon to make running these workloads plausible on servers that will be mostly used for other things, or when building AI models that don’t fit within the constraints of a modern GPU. The company’s longer-term goal is to compete in the AI market with a range of equipment, not just Xeons. With Xe not quite ready yet, competing in the HPC space right now means competing with Xeon.

For those of you wondering about AMD, AMD isn’t really talking about running AI workloads on Epyc CPUs, but has focused on its RocM initiative for running CUDA code on OpenCL. AMD does not talk about this side of its business very much, but Nvidia dominates the market for AI and HPC GPU applications. Both AMD and Intel want a piece of the space. Right now, both appear to be fighting uphill to claim one.

Continue reading

Chromebooks Gain Market Share as Education Goes Online

Chromebook sales have exploded in the pandemic, with sales up 90 percent and future growth expected. This poses some challenges to companies like Microsoft.

Space Mining Gets 400 Percent Boost From Bacteria, ISS Experiments Show

We'll need lots of raw materials to sustain human endeavors on other planets, and a new project on the International Space Station demonstrates how we can make space mining over 400 percent more efficient.

Benchmark Results Show Apple M1 Beating Every Intel-Powered MacBook Pro

Apple's new M1 SoC can beat every single Intel system it sells, at least in one early benchmark result. We dig into the numbers and the likely competitive situation.

FTC Files Antitrust Case to Break Up Facebook

New York Attorney General Letitia James has announced a major antitrust case against Facebook, which will be joined by 47 other state and regional AGs. And that's not all: the Federal Trade Commission (FTC) is filing a separate case against Facebook later today.