Google Announces 8x Faster TPU 3.0 For AI, Machine Learning

Google Announces 8x Faster TPU 3.0 For AI, Machine Learning

For the past few years, Google has been building its own TPU (Tensor Processing Units) to handle various processing tasks related to artificial intelligence and machine learning. Google first announced the existence of TPUs in 2016, but said it had been using them internally for more than a year. The company’s second-generation TPUs have made the news recently for significantly improved performance, and the third-generation hardware will apparently continue that trend.

According to Google CEO Sundar Pichai, the new TPU 3.0 pods are 8x more powerful than the Google TPU 2.0 pods that we’ve previously covered, and Pichai declared that they’re power-hungry enough to need water cooling — something previous TPUs simply haven’t required. TechCrunch’s image shows a copper-plated cooler system, with water from the same pipe run through all four cooling plates.

Image by TechCrunch
Image by TechCrunch

So what do we know about TPU 3.0? Not much — but we can make a few educated guesses. According to Google’s own documentation, TPU 1.0 was built on a 28nm process node at TSMC, clocked at 700MHz, and consumed 40W of power. Each TPU PCB connected via PCIe 3.0 x16.

TPU 2.0 made some significant changes. Unlike TPU v1, which could only handle 8-bit integer operations, Google added support for single-precision floats in TPU v2 and added 8GB of HBM memory to each TPU to improve performance. A TPU cluster consists of 180 TFLOPS of total computational power, 64GB of HBM memory, and 2,400GB/s of memory bandwidth in total (the last thrown in purely of the purposes of making PC enthusiasts moan with envy).

Unlike TPU v1, which used 3.5-inch drive bays as its form factor, TPU v2 was welded together in groups of four ASIC chips. Google currently deploys TPUs in clusters of up to 64 boards, at 11.5 PFLOPS per cluster and 4TB of total HBM storage. Power consumption requirements were already estimated to be pretty high with last year’s model, and these solutions consume even more power, so the switch to water cooling makes sense — it’s probably the only way to deal with the heat output, especially if Google is stuffing 64 TPUs into a single cluster.

No word yet on other advanced capabilities of the processors, and they are supposedly still for Google’s own use, rather than wider adoption. Pichai claims TPU v3 can handle 100 PFLOPS, but that has to be the clustered variant, unless Google is also rolling out a new tentative project we’ll call “Google Stellar-Equivalent Thermal Density.” We would’ve expected to hear about it, if that was the case. As more companies flock to the AI / ML banner, expect to see more firms throwing their hats into this proverbial ring.

Continue reading

Astronomers Might Finally Know the Source of Fast Radio Bursts
Astronomers Might Finally Know the Source of Fast Radio Bursts

A trio of new studies report on an FRB within our own galaxy. Because this one was so much closer than past signals, scientists were able to track it to a particular type of neutron star known as a magnetar.

Apple’s New M1 SoC Looks Great, Is Not Faster Than 98 Percent of PC Laptops
Apple’s New M1 SoC Looks Great, Is Not Faster Than 98 Percent of PC Laptops

Apple's new M1 silicon really looks amazing, but it isn't faster than 98 percent of the PCs sold last year, despite what the company claims.

What Does It Mean for the PC Market If Apple Makes the Fastest CPU?
What Does It Mean for the PC Market If Apple Makes the Fastest CPU?

Apple's M1 SoC could have a profound impact on the PC market. After 25 years, x86 may no longer be the highest-performing CPU architecture you can practically buy.

Which Is Faster, the Xbox Series X or PlayStation 5? Early Data Says It’s Complicated
Which Is Faster, the Xbox Series X or PlayStation 5? Early Data Says It’s Complicated

Competitive head-to-head data on the Xbox Series X versus the PlayStation 5 is beginning to trickle out.