Cerebras Systems Unveils 1.2 Trillion Transistor Wafer-Scale Processor

Cerebras Systems Unveils 1.2 Trillion Transistor Wafer-Scale Processor

Credit: Getty Images

Modern CPU transistor counts are enormous — AMD announced earlier this month that a full implementation of its 7nm Epyc “Rome” CPU weighs in at 32 billion transistors. To this, Cerebras Technology says: “Hold my beer.” The AI-focused company has designed what it calls a Wafer Scale Engine. The WSE is a square, approximately eight inches by nine inches, and contains roughly 1.2 trillion transistors.

I’m genuinely surprised to see a company bringing a wafer-scale product to market this quickly. The idea of wafer-scale processing has attracted some attention recently as a potential solution to performance scaling difficulties. In the study we discussed earlier this year, researchers evaluated the idea of building an enormous GPU across most or all of a 100mm wafer. They found that the technique could product viable, high-performance processors and that it could also scale effectively to larger node sizes. The Cerebras WSE definitely qualifies as lorge large — its total surface area is much larger than the hypothetical designs we considered earlier this year. It’s not a full-sized 300mm wafer, but it’s got a higher surface area than a 200mm does.

Not Pictured: PCIe x1600 slot.
Not Pictured: PCIe x1600 slot.

As you can see, it compares fairly well.

The Cerebras WSE contains 400,000 sparse linear algebra cores, 18GB of total on-die memory, 9PB/sec worth of memory bandwidth across the chip, and separate fabric bandwidth of up to 100Pbit/sec. The entire chip is built on TSMC’s 16nm FinFET process. Because the chip is built from (most) of a single wafer, the company has implemented methods of routing around bad cores on-die and can keep its arrays connected even if it has bad cores in a section of the wafer. The company says it has redundant cores implemented on-die, though it hasn’t discussed specifics yet. Details on the design are being presented at Hot Chips this week.

The WSE — “CPU” simply doesn’t seem sufficient — is cooled using a massive cold plate sitting above the silicon, with vertically mounted water pipes used for direct cooling. Because there’s no traditional package large enough to fit the chip, Cerebras has designed its own. PCWorld describes it as “combining a PCB, the wafer, a custom connector linking the two, and the cold plate.” Details on the chip, like its raw performance and power consumption, are not yet available.

A fully functional wafer-scale processor, commercialized at scale, would be an exciting demonstration of whether this technological approach has any relevance to the wider market. While we’re never going to see consumer components sold this way, there’s been interest in using wafer-scale processing to improve performance and power consumption in a range of markets. If consumers continue to move workloads to the cloud, especially high-performance workloads like gaming, it’s not crazy to think we might one day see GPU manufacturers taking advantage of this idea — and building arrays of parts that no individual could ever afford to power cloud gaming systems in the future.

Continue reading

Asus Announces Chromebox 4 With Support for 10th Gen Core Processors
Asus Announces Chromebox 4 With Support for 10th Gen Core Processors

Chromebooks are so plentiful these days they might as well grow on trees. There are fewer Chromeboxes, but Asus has been keeping its line updated and just announced its latest version.

AMD Launches New Ryzen 5000 Mobile Processors
AMD Launches New Ryzen 5000 Mobile Processors

AMD's Ryzen Mobile 5000 chips are now shipping in OEM systems. So what kind of performance improvements do they really offer?

How Are Process Nodes Defined?
How Are Process Nodes Defined?

We talk about process nodes a lot at wfoojjaec, but we don't always drill down into how nodes are defined or how they differ between manufacturers. Let's fix that.

Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed
Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed

Samsung has developed a new type of processor-in-memory, built around HBM2. It's a new achievement for AI offloading and could boost performance by up to 2x while cutting power consumption 71 percent.