Cerebras Unveils 2nd Gen Wafer Scale Engine: 850,000 Cores, 2.6 Trillion Transistors

Cerebras is back with the second generation of its Wafer Scale Engine. WSE 2.0 — sadly, the name “Son of Wafer-Scale” appears to have died in committee — is a 7nm die shrink of the original, with far more cores, more RAM, and 2.6 trillion transistors, with a “T.” Makes the 54 billion on your average Nvidia A100 look a bit pedestrian, for a certain value of “pedestrian.”
The concept of a wafer-scale engine is simple: Instead of etching dozens or hundreds of chips into a wafer and then packaging those CPUs or GPUs for individual resale, why not use an entire wafer (or most of a wafer, in this case) for one enormous processor?
People have tried this trick before, with no success, but that was before modern yields improved to the point where building 850,000 cores on a piece of silicon the size of a cutting board was a reasonable idea. Last year, the Cerebras WSE-1 raised eyebrows by offering 400,000 cores, 18GB of on-chip memory, and 9PB/s of memory bandwidth, with 100Pb/s of fabric bandwidth across the wafer. Today, the WSE-2 offers 850,000 cores, 40GB of on-chip SRAM memory, and 20PB/s of on-wafer memory bandwidth. Total fabric bandwidth has increased to 220Pb/s.

While the new WSE-2 is certainly bigger, there’s not much sign it’s different. The top-line stat improvements are all impressive, but the gains are commensurate across the board, which is to say: A 2.12x increase in core count is matched by a 2.2x increase in RAM, a 2.2x increase in memory bandwidth, and a 2.2x increase in fabric bandwidth. The actual amount of RAM, RAM bandwidth, or fabric bandwidth, evaluated on a per-core basis, is virtually identical between the two WSEs.
Normally, with a second-generation design like this, we’d expect the company to make some resource allocation changes or to scale out some specific aspect of the design, such as adjusting the ratios between core counts, memory bandwidth, and total RAM. The fact that Cerebras chose to scale the WSE-1 upwards into the WSE-2 without adjusting any other aspect of the design implies the company targeted its initial hardware well and was able to scale it upwards to meet the desires of its customer base without compromising or changing other aspects of the WSE architecture.
One of Cerebras’ arguments in favor of its own designs is the simplicity of scaling a workload across a single WSE, rather than attempting to scale across the dozens or hundreds of GPUs that might be required to match its performance. It isn’t clear how easy it is to adapt workloads to the WSE-1 or WSE-2, and there don’t seem to be a lot of independent benchmarks available yet to compare scaling between the WSE-1 or WSE-2 and equivalent Nvidia cards. We would expect the WSE-2 to have the advantage in scaling, assuming the relevant workload fits the characteristics of both systems equally, due to the intrinsic difficulty of splitting a workload efficiently across an ever-larger number of accelerator cards.
Cerebras doesn’t appear to have publicly published any benchmarks of the WSE-1 or WSE-2 comparing it against other systems, so we’re still in a holding pattern as far as that kind of data. Moving on from the WSE-1 to the WSE-2 this quickly, however, does imply some customer interest in the chip.
Continue reading

AMD Has No Near-Term Plan For Hybrid Big, Little Cores on Same Silicon
AMD has no plans to build a hybrid CPU with big and little cores any time in the near future, but the company continues to evaluate the concept.

Asus Announces Chromebox 4 With Support for 10th Gen Core Processors
Chromebooks are so plentiful these days they might as well grow on trees. There are fewer Chromeboxes, but Asus has been keeping its line updated and just announced its latest version.

How Does Windows Use Multiple CPU Cores?
We take multi-core awareness for granted these days, but how do the CPU and operating system communicate with each other in the first place?

ET Deals: Dell Inspiron 15 5000 Intel Core i7-1165G7 Laptop for $674, iRobot Roomba i7+ 7550 Robot Vacuum for $599
Today you can take advantage of a 10 percent discount to snag a Dell Inspiron 15 5000 laptop with an Intel Core i7-1165G7 processor, 12GB of RAM and a 512GB NVMe SSD for just $674. You can also get iRobot's Roomba i7+ robot vacuum for just $599.00, which is the same price it was on Cyber Monday.