AMD Files Patent for Its Own GPU Chiplet Implementation

AMD Files Patent for Its Own GPU Chiplet Implementation

AMD has filed for a patent on a chiplet-based approach to GPU design. One of the key goals of this approach is to create larger GPU configurations than are possible with a single, monolithic die.

AMD is the third company to share a little information on how it might approach this problem, though that’s probably stretching the definition of “sharing” a bit. You can find the patent here — we’ll briefly look at what Intel and Nvidia have proposed before we talk about AMD’s patent filing.

AMD Files Patent for Its Own GPU Chiplet Implementation

Intel has previously stated that its Ponte Vecchio data center GPU would use a new memory architecture (Xe-MF), with EMIB and Foveros. EMIB is a technique for connecting different chips on the same package, while Foveros uses large through-silicon vias to connect off-die hardware blocks at effectively on-die connectivity. This approach relies specifically on packaging and interconnect technology Intel has designed for its own use.

AMD Files Patent for Its Own GPU Chiplet Implementation

Nvidia proposed what it called a Multi-Chip Module GPU, or MC-GPU, that resolved problems intrinsic to distributing workloads across multiple GPUs by using NUMA, with additional features intended to reduce on-package bandwidth usage like an L1.5 cache, though it acknowledged unavoidable latency penalties when hopping across the various interconnected GPUs.

AMD Files Patent for Its Own GPU Chiplet Implementation

AMD’s method envisions a GPU chiplet organized somewhat differently from what we’ve seen from the 7nm CPUs it has launched to date. Organizing a GPU into an effective chiplet design can be difficult due to restrictions on inter-chiplet bandwidth. This is less of a problem with CPUs, where cores don’t necessarily communicate all that much, and there aren’t nearly as many of them. A GPU has thousands of cores, while even the largest x86 CPUs have just 64.

One of the problems Nvidia highlighted in its 2017 paper was the need to take pressure off the limited bandwidth available for MC-GPU to MC-GPU communication. The proposed L1.5 cache architecture that the company proposes is meant to alleviate this problem.

The implementation AMD describes above is different from what Nvidia envisions. AMD ties both work group processors (shader cores) and GFX (fixed-function units) directly to the L1 cache. The L1 cache is itself connected to a Graphics Data Fabric (GDF), which also connects the L1 and the L2. L2 cache is coherent within any single chiplet, and any WGP or GFX block can read data from any part of the L2.

In order to wire multiple GPU chiplets into a cohesive GPU processor, AMD first connects the L2 cache banks to the HPX passive crosslink above, using a scalable data fabric (SDF). That crosslink is what handles the job of inter-chiplet communication. The SDF on each chiplet is wired together through the HPX passive crosslink — that’s the single, long arrow connecting two chiplets above. This crosslink also attaches to the L3 cache banks on each chiplet. In this implementation, the GDDR lanes are wired to the L3 cache.

AMD’s patent assumes that only one GPU chiplet connects with the CPU, with the passive interconnect tying the rest together via a large, shared L3 cache. Nvidia’s MC-GPU doesn’t use an L3 in this fashion.

Theoretically, this is all very interesting, and we’ve already seen AMD ship a GPU with a big honkin’ L3 on it, courtesy of RDNA2’s Infinity Cache. Whether AMD will actually ship a part using GPU chiplets is a very different question from whether it wants patents on various ideas it might want to use.

Decoupling the CPU and GPU essentially reverses the work that went into combining them in the first place. One of the basic challenges the GPU chiplet approach must overcome is the intrinsically higher latencies created by moving these components away from each other.

Multi-chip GPUs are a topic that AMD and Nvidia have both been discussing for years. This patent doesn’t confirm that any products will hit the market in the near term, or even that AMD will ever approach this tech at all.

Continue reading

Chiplets are the Future, But They Won’t Replace Moore’s Law
Chiplets are the Future, But They Won’t Replace Moore’s Law

AMD and Intel are both moving to 'chiplet' configurations of one sort or another, but the long-term scaling issues faced by both aren't going away.

AMD Won’t Launch a Chiplet-Based APU on Ryzen Matisse
AMD Won’t Launch a Chiplet-Based APU on Ryzen Matisse

AMD won't be bringing any chiplet graphics to market with its Matisse Ryzen 3 CPU. So what's that extra bit of die space for, hrm?

Chiplets Are Both Solution to and Symptom of a Larger Problem
Chiplets Are Both Solution to and Symptom of a Larger Problem

Chiplets are a new method of building microprocessors, but the gains and improvements should be understood in the context of the difficulties they are intended to address.