Google’s Cloud TPU Matches Volta in Machine Learning at Much Lower Prices

Google’s Cloud TPU Matches Volta in Machine Learning at Much Lower Prices

Over the past few years, Nvidia has established itself as a major leader in machine learning and artificial intelligence processing. The GPU designer dove into the HPC market over a decade ago when it launched the G80 and its parallel compute platform API, CUDA. Early leadership has paid off for Nvidia; the company holds 87 spots on the TOP500 list of supercomputers, compared with just 10 for Intel. But as machine learning and artificial intelligence workloads proliferate, vendors are emerging to give Nvidia a run for its money, including Google’s new Cloud TPU. New benchmarks from RiseML put both Nvidia and Google’s TPU head-to-head — and the cost curve strongly favors Google.

Because ML and AI are both new and emerging fields, it’s important that tests be conducted fairly and that the benchmark runs don’t favor one architecture over the other simply because best testing parameters aren’t well-known. To that end, RiseML allowed both Nvidia and Google engineers to review drafts of their test results and to offer comments and suggestions. The company also states its figures have been reviewed by an additional panel of outside experts in the field.

The comparison is between four Google TPUv2 chips (which form one Cloud TPU) against 4x Nvidia Volta GPUs. Both have 64GB of total RAM and the data sets were trained in the same fashion. RiseML tested the ResNet-50 model (exact configuration details are available in the blog post) and the team investigated both raw performance (throughput), accuracy, and convergence (an algorithm converges when its output comes closer and closer to a specific value).

Google’s Cloud TPU Matches Volta in Machine Learning at Much Lower Prices

The suggested batch size for TPUs is 1024, but other batch sizes were tested at reader request. Nvidia does perform better at those lower batch sizes. In accuracy and convergence, the TPU solution is somewhat better (76.4 percent top-1 accuracy for Cloud TPU, compared with 75.7 percent for Volta). Improvements to top-end accuracy are difficult to come by, and the RiseML team makes the small difference between the two solutions out to be more important than you might think. But where Google’s Cloud TPU really wins, at least right now, is on pricing.

Google’s Cloud TPU Matches Volta in Machine Learning at Much Lower Prices

RiseML writes:

Ultimately, what matters is the time and cost it takes to reach a certain accuracy. If we assume an acceptable solution at 75.7 percent (the best accuracy achieved by the GPU implementation), we can calculate the cost to achieve this accuracy based on required epochs and training speed in images per second. This excludes time to evaluate the model in-between epochs and training startup time.

As shown above, the current pricing of the Cloud TPU allows to train a model to 75.7 percent on ImageNet from scratch for $55 in less than 9 hours! Training to convergence at 76.4 percent costs $73. While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.

Google may be subsidizing its cloud processor pricing, and the exact performance characteristics of ML chips will vary depending on implementation and programmer skill. This is far from the final word on Volta’s performance, or even Volta as compared with Google’s Cloud TPU. But at least for now, in ResNet-50, Google’s cloud TPU appears to offer nearly identical performance at substantially lower prices.

Continue reading

Microsoft Matches Epic, Reduces Windows Store Gaming Cut to 12 Percent
Microsoft Matches Epic, Reduces Windows Store Gaming Cut to 12 Percent

Microsoft has cut its Windows Store gaming revenue share from 30 percent to 12 percent, in an effort to entice developers over to its platform.

Diablo Immortal Player Drops $100K, Becomes Too Powerful to Find a Match
Diablo Immortal Player Drops $100K, Becomes Too Powerful to Find a Match

One streamer spent big on Immortal, and he's "won" the game. His prize: being too overpowered to play with anyone else.

Samsung’s Odyssey Ark Is a Glorious, 55-inch 4K Gaming Monitor, With a Price to Match
Samsung’s Odyssey Ark Is a Glorious, 55-inch 4K Gaming Monitor, With a Price to Match

The Samsung Odyssey Ark is a 55-inch gaming monitor with every feature you'd expect and several you would not, and it breaks the bank at $3,500.

Elon Musk Gets in Spitting Match With Media. It Won’t Be Our Stock That Stumbles.
Elon Musk Gets in Spitting Match With Media. It Won’t Be Our Stock That Stumbles.

As Tesla endures solvable setbacks, Elon Musk lashes back at the media and analysts. Then he proposes a media-rating site ideally Pravda, Russian for "truth."