Google’s AI-Focused Tensor Processing Units Now Available in Beta

Google has been working on its Tensor Processing Units, or TPUs, for several years now, and has released several papers on the performance of its customized architecture in inferencing workloads compared with more traditional models built around CPUs or GPUs. Now the company is opening these parts up for public beta testing, to help researchers who want to train machine learning workloads and run them more quickly.

Google has talked about making this capability public since it demonstrated its first-generation TPUs back in 2016. Those chips, however, were only good for inference workloads. The simple way to understand the difference between training a machine learning system and an inference workload is that the former is when you create your model and train it in the tasks you want it to perform, while the latter is the actual process of applying what the machine has “learned.” Google never made its first-generation TPU available to corporations for general workloads, but these new chips are capable of addressing both model training and inference workloads, and offer a higher level of performance besides.

We don’t know how these new Cloud TPUs perform, but a slideshow comparing Google’s earlier TPU in inference workloads against equivalent parts from Intel and Nvidia is shown below:

Haswell tops out at 13 operations per byte. MLP0 (purple diamond) could theoretically benefit from both greater tuning and more memory bandwidth. The other tests are under the flat roofline, which means they aren't hitting a memory bottleneck. Google notes that LSTM0 and MLP1 are faster on Haswell than Nvidia's K80.

Nvidia's K80 pushes more TeraOPs/sec than Haswell and the slanted portion of the curve is even steeper. K80's practical performance is below theoretical in most tests because inference workloads prize low latency, making them a poor match for GPUs. Despite this, Nvidia is faster or equal to Intel in all but two workloads.

The TPU shows very different characteristics. Only two workloads aren't memory bandwidth-limited, and every single workload is vastly faster than on GPU or CPU. MLP1 is 48.5x faster compared to K80 while CNN0 is 143x faster than Haswell.

Finally, here's all three data sets combined. Stars show TPU performance, triangles represent K80, and circles are for Haswell. In every case, without exception, Google's new TPU is significantly faster than either Haswell or K80 — and not by small margins.

Each Cloud TPU consists of four separate ASICs, with a total of 180 TFLOPs of performance per board. Google even has plans to scale up these offerings further, with a dedicated network and scaleout systems it’s calling “TPU Pods.” [Please don’t eat these either. -Ed] Google claims that even at this early stage, a researcher following one of their tutorials could train a machine learning network on the public TPU network to “train ResNet-50 to the expected accuracy on the ImageNet benchmark challenge in less than a day, all for well under $200.”

Expect to see a lot of mud being slung at the wall over the next few years, as literally everyone piles into this market. AMD has Radeon Instinct, and Intel still has its own Xeon Phi accelerators (even if it canceled its upcoming Knights Hill), Knights Mill, launched in December, with additional execution resources and better AVX-512 utilization. Whether this will close the gap with Nvidia’s Tesla product family is yet to be seen, but Google isn’t the only company deploying custom silicon to address this space. Fujitsu has its own line of accelerators in the works, and Amazon and Microsoft have previously deployed FPGA’s in their own data centers and clouds.

Google’s new cloud offerings are billed by the second, with an average cost of $6.50 per Cloud TPU per hour. If you’re curious about signing up for the program, you can do so here. Cloud computing may have begun life as little more than a rebranding effort to capture previously available products under a catchy new term, but the entire semiconductor industry is now galloping towards these new computing paradigms as quickly as it can. From self-driving cars to digital assistants, “cloud computing” is being reinvented as something more significant than “everything I normally do, but with additional latency.” Ten years from now, it may be hard to remember why enterprises relied on anything else.

Continue reading

Big Bounce or Big Bang? Scientists Still Grappling With Origin of Universe

Most experts on physics and cosmology accept the inflation model, a straight line from the big bang to our infinitely expanding universe. However, some scientists hold onto the possibility of a "Big Bounce" instead of a bang, and they're still actively searching for evidence that could upend the conventional wisdom.

Astronomers Find Oldest Supermassive Black Hole in the Universe

This recently spotted object is the oldest known quasar in the universe, with a supermassive black hole more than 13 billion years old. In fact, it's so old and huge that scientists don't know exactly how it could have formed.

Samsung Files Documents to Build New, $17 Billion Fab in the United States

The earlier rumors that Samsung might build a factory in the United States have proven true. The company is investigating potential sites in New York, Texas, and Arizona, with plans to build a $17 billion facility.Anandtech reports that the foundry would be online by Q4 2023. That’s a quick ramp, if true. For comparison, Intel’s…

Recent iPhone Security Hole Becomes Universal Jailbreak

Apple urged iPhone owners to install the latest update to iOS last month, but that in and of itself wasn't unusual. What was unusual was the reason for the update. Apple rolled out iOS 14.4 to plug a security hole that online criminals were actively exploiting. Now, that vulnerability has popped up again as a universal jailbreak for iDevices.