Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed

Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed

Samsung has announced the availability of a new Aquabolt variation. Unlike the typical clock speed jump or capacity improvement you’d expect, this new HBM-PIM can perform calculations directly on-chip that would otherwise be handled by an attached CPU, GPU, or FPGA.

PIM stands for Processor-in-Memory, and it’s a noteworthy achievement for Samsung to pull this off. Processors currently burn an enormous amount of power moving data from one location to another. Moving data takes time and costs power. The less time a CPU spends moving data (or waiting on another chip to deliver data), the more time it can spend performing computationally useful work.

CPU developers have worked around this problem for years by deploying various cache levels and integrating functionality that once lived in its own socket. Both FPUs and memory controllers were once mounted on the motherboard rather than directly integrated into the CPU. Chiplets actually work directly against this aggregation trend, which is why AMD has had to be careful that its Zen 2 and Zen 3 design could boost overall performance while disaggregating the CPU die.

If bringing the CPU and memory closer together is good, building processing elements directly into memory would be even better. Historically, this has been difficult because logic and DRAM are typically built very differently. Samsung has apparently solved this problem, and it’s leveraged the die-stacking capabilities of HBM to keep available memory density sufficiently high to interest customers. Samsung claims it can deliver a more than 2x performance improvement with a 70 percent power reduction at the same time, with no required hardware or software changes. The company expects validation to be complete by the end of the first half of this year.

Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed

THG has some details about the new HBM-PIM solution, gleaned from Samsung’s ISSCC presentation this week. The new chip incorporates a Programmable Computing Unit (PCU) clocked at just 300MHz. The host controls the PCU via conventional memory commands and can use it to perform FP16 calculations directly in-DRAM. The HBM itself can operate either as normal RAM or in FIM mode (Function-in-Memory).

Including the PCU reduces the total available memory capacity, which is why the FIMDRAM (that’s another term Samsung is using for this solution) only offers 6GB of capacity per stack instead of the 8GB you’d get with standard HBM2. All of the solutions shown are built on a 20nm DRAM process.

Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed

Samsung’s paper describes the design as “Function-In Memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine within the memory banks and that exploits bank-level parallelism to provide 4× higher processing bandwidth than an off-chip memory solution.”

Samsung Stuffs 1.2TFLOP AI Processor Into HBM2 to Boost Efficiency, Speed

One question Samsung hasn’t answered is how it deals with thermal dissipation, a key reason why it’s been historically difficult to build processing logic inside DRAM. This could be doubly difficult with HBM, in which each layer is stacked on top of another. The relatively low clock speed on the PIM may be a way of keeping DRAM cool.

We haven’t seen HBM deployed for CPUs much, Hades Canyon notwithstanding, but multiple high-end GPUs from Nvidia and AMD have tapped HBM/HBM2 as primary memory. It’s not clear if a conventional GPU would benefit from this offload capability, or how such a feature would be integrated into the GPUs own impressive computational capacity. If Samsung can offer the performance and power improvements it claims to a range of customers, however, we’ll undoubtedly see this new HBM-PIM popping up in products a year or two from now. A 2x performance boost coupled with a 70 percent power consumption decrease is the kind of old-school improvement lithography node transitions used to deliver on a regular basis. It’s not clear if Samsung’s PIM will specifically catch on, but any promise of a classic full-node improvement will draw attention, if nothing else.

Continue reading

Intel Launches AMD Radeon-Powered CPUs
Intel Launches AMD Radeon-Powered CPUs

Intel's new Radeon+Kaby Lake hybrid CPUs are headed for store shelves. Here's how the SKUs break down and what you need to know.

NASA’s OSIRIS-REx Asteroid Sample Is Leaking into Space
NASA’s OSIRIS-REx Asteroid Sample Is Leaking into Space

NASA reports the probe grabbed so much regolith from the asteroid that it's leaking out of the collector. The team is now working to determine how best to keep the precious cargo from escaping.

Chromebooks Gain Market Share as Education Goes Online
Chromebooks Gain Market Share as Education Goes Online

Chromebook sales have exploded in the pandemic, with sales up 90 percent and future growth expected. This poses some challenges to companies like Microsoft.

Intel’s Raja Koduri to Present at Samsung Foundry’s Upcoming Conference
Intel’s Raja Koduri to Present at Samsung Foundry’s Upcoming Conference

Intel's Raja Koduri will speak at a Samsung foundry event this week — and that's not something that would happen if Intel didn't have something to say.