Cerebras Unveils 2nd Gen Wafer Scale Engine: 850,000 Cores, 2.6 Trillion Transistors

Cerebras Unveils 2nd Gen Wafer Scale Engine: 850,000 Cores, 2.6 Trillion Transistors

Cerebras is back with the second generation of its Wafer Scale Engine. WSE 2.0 — sadly, the name “Son of Wafer-Scale” appears to have died in committee — is a 7nm die shrink of the original, with far more cores, more RAM, and 2.6 trillion transistors, with a “T.” Makes the 54 billion on your average Nvidia A100 look a bit pedestrian, for a certain value of “pedestrian.”

The concept of a wafer-scale engine is simple: Instead of etching dozens or hundreds of chips into a wafer and then packaging those CPUs or GPUs for individual resale, why not use an entire wafer (or most of a wafer, in this case) for one enormous processor?

People have tried this trick before, with no success, but that was before modern yields improved to the point where building 850,000 cores on a piece of silicon the size of a cutting board was a reasonable idea. Last year, the Cerebras WSE-1 raised eyebrows by offering 400,000 cores, 18GB of on-chip memory, and 9PB/s of memory bandwidth, with 100Pb/s of fabric bandwidth across the wafer. Today, the WSE-2 offers 850,000 cores, 40GB of on-chip SRAM memory, and 20PB/s of on-wafer memory bandwidth. Total fabric bandwidth has increased to 220Pb/s.

Cerebras Unveils 2nd Gen Wafer Scale Engine: 850,000 Cores, 2.6 Trillion Transistors

While the new WSE-2 is certainly bigger, there’s not much sign it’s different. The top-line stat improvements are all impressive, but the gains are commensurate across the board, which is to say: A 2.12x increase in core count is matched by a 2.2x increase in RAM, a 2.2x increase in memory bandwidth, and a 2.2x increase in fabric bandwidth. The actual amount of RAM, RAM bandwidth, or fabric bandwidth, evaluated on a per-core basis, is virtually identical between the two WSEs.

Normally, with a second-generation design like this, we’d expect the company to make some resource allocation changes or to scale out some specific aspect of the design, such as adjusting the ratios between core counts, memory bandwidth, and total RAM. The fact that Cerebras chose to scale the WSE-1 upwards into the WSE-2 without adjusting any other aspect of the design implies the company targeted its initial hardware well and was able to scale it upwards to meet the desires of its customer base without compromising or changing other aspects of the WSE architecture.

One of Cerebras’ arguments in favor of its own designs is the simplicity of scaling a workload across a single WSE, rather than attempting to scale across the dozens or hundreds of GPUs that might be required to match its performance. It isn’t clear how easy it is to adapt workloads to the WSE-1 or WSE-2, and there don’t seem to be a lot of independent benchmarks available yet to compare scaling between the WSE-1 or WSE-2 and equivalent Nvidia cards. We would expect the WSE-2 to have the advantage in scaling, assuming the relevant workload fits the characteristics of both systems equally, due to the intrinsic difficulty of splitting a workload efficiently across an ever-larger number of accelerator cards.

Cerebras doesn’t appear to have publicly published any benchmarks of the WSE-1 or WSE-2 comparing it against other systems, so we’re still in a holding pattern as far as that kind of data. Moving on from the WSE-1 to the WSE-2 this quickly, however, does imply some customer interest in the chip.

Continue reading

Intel CEO: Chips Will Have 1 Trillion Transistors by 2030
Intel CEO: Chips Will Have 1 Trillion Transistors by 2030

At the HotChips keynote, CEO Gelsinger opined on the future of chip design, and it unsurprisingly involves all of Intel's current and future technologies.

TSMC Believes It Will Earn $1.5 Trillion on Its 3nm Process
TSMC Believes It Will Earn $1.5 Trillion on Its 3nm Process

A large chunk of that cash will assuredly come directly from Apple's piggy bank.

ARM Announces Project Trillium, a New Dedicated AI Processing Family
ARM Announces Project Trillium, a New Dedicated AI Processing Family

ARM has launched a new series of AI-focused processors, with the goal of improving machine learning in mobile devices.

New Study Says Our Galaxy Weighs 1.5 Trillion Solar Masses
New Study Says Our Galaxy Weighs 1.5 Trillion Solar Masses

Researchers from the European Southern Observatory (ESO) have managed to accurately measure the galaxy's mass including all the dark matter. They say the Milky Way weighs in at a hefty 1.5 trillion solar masses.