At ISSCC last week (that’s the International Solid-State Circuits Conference), AMD spoke about the design considerations that led to its Epyc server processors and why the company is confident that its approach to server CPU development will yield significant dividends compared with Intel’s practices.
While both companies compete in the x86-64 server market, they’ve taken very different approaches to their high-end processors. Intel favors what’s known as a monolithic core design. This design philosophy results in a single die mounted to the CPU package. As core counts scale up, the die becomes larger and larger. The more cores you have, the trickier it is to ensure that each CPU core has appropriate access to L3 cache at a single, consistent latency.
While Intel has never released formal die sizes, Anandtech claims 10-core Skylake-SP CPUs weigh in at 322mm2, 18-core chips at 484mm2, and 28-core chips at 698mm2. While we have no idea how good Intel’s yields on chips like the Core i9-7980XE are, one can reasonably expect them to be at least slightly lower than a four-core or eight-core part. This is why companies use die recovery to lock off bad cores rather than throwing the CPU away.
Intel, with its deep pockets, can afford to build these monolithic dies for high-end server and workstation chips, but the difficulty of doing so is why it typically takes the company months longer to launch new high-core server CPUs than their lower-core consumer counterparts.
Those of you who have followed AMD over the past few years are aware of just how precarious the company’s financial situation was during the Bulldozer era. To date, AMD has introduced just two dies — the Ryzen 7 1800X die, which was used for every Ryzen CPU without integrated graphics, and the Ryzen 5 2400G die, which combines a quad-core CPU with an on-die GPU. One of the key criteria for AMD’s new server initiative with Epyc was to find a way to scale its eight-core Ryzen 7 building block into server processors that could challenge Intel across the product stack.
Intel’s 18-core Core i9-7980XE is faster than AMD’s 16-core Threadripper, but it’s also much more difficult to scale. Right now, Intel’s Core i9 family uses LGA2066, while its high-core Xeon parts use LGA3647. It’s not clear if Intel can scale LGA2066 to higher core counts without requiring a full motherboard swap at even larger price premiums — and that’s before we get to the $1,000 price difference between Threadripper 1950X ($1,000, 16-cores) and the Intel Core i9-7980XE ($2,000, 18 cores).
AMD, in contrast, has a path to a 32-core Threadripper right now. It can ramp Threadripper to 24 or 32 cores simply by increasing the number of MCMs under the heatspreader.
The MCM design isn’t without a few drawbacks; AMD estimates that using a multi-chip module costs it a 10 percent area penalty, but that penalty is dwarfed by the whack it would take on CPU yields and CPU cost. Using an MCM structure also allowed AMD to move to eight DDR4 memory channels (it’s more accurate to say that Epyc is a 4×2 design in which each die has its own dual-channel DDR4 memory implementation). A four-die Epyc CPU offers 64 PCIe 3.0 lanes, with 128 PCIe lanes available in a dual socket system. On the other hand, power consumption tests have shown that while AMD uses less power per core than Intel does, the Infinity Fabric appears to burn more power than Intel’s ring bus topology.
AMD wasn’t willing to say much about how it intends to improve Epyc in future generations, but they were bullish on Epyc’s performance to-date. Comprehensive data on server benchmarks is hard to come by, but a review by Johan De Gelas for Anandtech in 2017 showed Epyc as a strong competitor to Xeon in a number of tests, while outperforming it robustly in FPU tests. There are unquestionably tests where Epyc falls behind its competition. Anandtech concludes:
AMD’s newest core is a formidable opponent. Scalar floating point operations are clearly faster on the AMD core, and integer performance is – at the same clock – on par with Intel’s best. The dual CCX layout and quad die setup leave quite a bit of performance on the table, so it will be interesting how much AMD has learned from this when they launch the 7nm “Rome” successor… All in all, it must be said that AMD executed very well and delivered a new server CPU that can offer competitive performance for a lower price point in some key markets. Server customers with non-scalar sparse matrix HPC and Big Data applications should especially take notice.
AMD’s MCM solution isn’t perfect, but it’s the solution the company needed for high-core-count server processors. It allowed AMD to use a single Ryzen die across all of its CPUs and to be aggressive on server CPU pricing, thereby benefiting from economies of scale. When 12nm Ryzen CPUs launching in the next few months, we should get a preview of any changes AMD made to the core or Infinity Fabric. As both companies scale up, it’ll be interesting to see which approach wins out between connecting chips via MCM and using a large monolithic die.
Intel’s Desktop TDPs No Longer Useful to Predict CPU Power Consumption
Intel's higher-end desktop CPU TDPs no longer communicate anything useful about the CPUs power consumption under load.
VIA Technologies, Zhaoxin Strengthen x86 CPU Development Ties
VIA and Zhaoxin are deepening their strategic partnership with additional IP transfers, intended to accelerate long-term product development.
Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications
Nvidia is already capitalizing on its ARM acquisition with a massively powerful new CPU-plus-GPU combination that it claims will speed up the training of large machine-learning models by a factor of 10.
How L1 and L2 CPU Caches Work, and Why They’re an Essential Part of Modern Chips
Ever been curious how L1 and L2 cache work? We're glad you asked. Here, we deep dive into the structure and nature of one of computing's most fundamental designs and innovations.