Chiplets Are Both Solution to and Symptom of a Larger Problem

Chiplets Are Both Solution to and Symptom of a Larger Problem

When AMD announced its third-generation Ryzen CPUs, it also declared that it would use a new method to connect its CPUs together. Instead of building standard, monolithic CPUs (or connecting two monolithic CPUs together in what’s known as a Multi-Chip Module, or MCM), AMD opted for a new type of configuration called a chiplet. From a marketing perspective, chiplets have been a huge hit; I’ve seen a lot of readers very excited about what they bring to the table. But there’s some larger context around chiplets — and why we’re taking this step in semiconductor manufacturing — that deserves to be explored, particularly if you want to understand the larger issues driving the entire industry, rather than just AMD.

Chiplets are both a symptom of a larger problem the semiconductor industry is having and (hopefully) at least a short-term solution to that same problem. Because we know the most about AMD’s strategy, I’ll be referring to it throughout this piece, but AMD isn’t the only company adopting chiplets. Every developer of high-performance silicon appears to be at least evaluating this idea.

What’s a Chiplet?

For as quickly as this term has caught on, it doesn’t always get defined. A chiplet contains some of the specialized function blocks that we typically think of as making up a monolithic microprocessor. With its third-generation Ryzen CPUs, AMD has chosen to split its I/O and DRAM controllers into a single functional block, while its CPU cores and L3 cache are contained within each individual chiplet.

Epyc’s I/O die, as shown at AMD’s New Horizon event.
Epyc’s I/O die, as shown at AMD’s New Horizon event.

This is not the only intrinsic way to build a chiplet. Over time, we expect to see manufacturers experiment with other design methodologies depending on the needs of their specific projects. Some chips might benefit from central pools of cache. In other cases, companies might opt to deploy asymmetric chiplets with different capabilities on each rather. There were theories that AMD might deploy a third-generation Ryzen APU with a CPU for one chiplet and a GPU for the other, but AMD has stated it will not use its Matisse architecture for this purpose. When we eventually do see a chiplet-based APU, it may be that AMD will continue to build the CPU and GPU on the same die, but will keep I/O and DRAM in a separate IC — or it could have a completely different subdivision in mind.

One of the major design goals of chiplets is to offer manufacturers more options when deciding which components of a design to shrink versus which to keep at the same size and process node.

The Limits to Shrink

“Faster, smaller, cheaper” has been the mantra of the computer industry for at least the past 60 years. The fundamental premise of Moore’s law as originally formulated by Gordon Moore was that advances in technological manufacturing would lead to advances in integration. It was the ability to build components side by side that allowed for the creation of the first CPUs and, later, that allowed those CPUs to absorb additional functionality and capability.

But while many different components had to scale downwards for decades in order for this process to occur the way it did, the total amount of scaling available has been different. As a simplified explanation: There is a point at which it no longer makes sense to make contact pads smaller or to attempt to build thinner wires because the increase in electrical resistance outweighs any benefit in power reduction. This irregular scaling is not new. Analog circuits also do not scale with new process nodes, and the difficulty of interfacing analog and digital together on the same SoC has become more difficult as we’ve hit lower nodes. What’s new — the problem that necessitated the adoption of new manufacturing strategies — is that we’re now facing so many scaling issues, it makes sense to break with 60 years of precedent and start breaking the CPU apart again.

The fact that we can no longer scale every aspect of the CPU down to a new node and expect to benefit is a fundamental shift from the past when this assumption was the default. Future improvements in I/O or any other component “left behind” on an older node will likely need to be delivered by better algorithms, packaging improvements, or materials engineering, not process node shrinks.

In theory, this could result in smaller improvements on a per-node basis. If you used to be able to improve the design of an entire chip by 15 percent (for whatever improvement metric you are targeting) and you now have to limit your applied improvements to the 50 percent of the CPU you are targeting for a die shrink, you may see smaller absolute gains on the whole.

Chiplets Are Both Solution to and Symptom of a Larger Problem

AMD predicts that its cost-per-yielded millimeter will double from 14/16nm to 7nm. The impact of high yields, in other words, has already been taken into account in this graph.

What Chiplets Can Solve, Sort of

Chiplets can address several negative trends in semiconductor manufacturing, at least to a point. They present manufacturers with a potentially more efficient means of achieving die shrinks, by focusing research and development on those portions of the chip that can be profitably shrunk. With a monolithic design, chipmakers have to shrink the entire chip, even if certain blocks aren’t being updated and won’t work differently thereafter.

Second, building smaller chips allows for less wafer waste (smaller CPUs waste less edge room), a higher number of CPU cores per wafer, and improves yield. In a monolithic design, one bad CPU core out of 18 means, at best, that you can’t sell the CPU as a full 18-core chip. It’ll have to be binned into a lower-priced segment. With chiplets, you’re theoretically giving up less when you throw out one chiplet as opposed to down-pricing or tossing a monolithic core. The exact savings depends on the specifics of your yield rates and options for selling less-than-perfect chips, but the possibility is certainly there.

Third, chiplets theoretically allow manufacturers to specialize individual function blocks for specific materials and processes. Achronix makes this point in a recent chiplet-focused PDF, writing:

The semiconductor material used to make each chiplet is not limited to silicon, which is another chiplet advantage. For example, specialized chiplets could be made from a variety of composite semiconductor materials including SiGe (silicon germanium), GaAs (gallium arsenide), GaN (gallium nitride), or InP (indium phosphide) to exploit the unique electronic properties of these semiconductor materials.

But again, this would be a profound departure from traditional CPU design. GaN, InP, GaAs, and SiGe exist on the fringes of mainstream silicon, used for specialized purposes where their particular traits give them an advantage over traditional manufacturing. This is why, despite the specific advantages of these materials for certain kinds of chips, we don’t see them used in, say, your typical Core i7 or AMD Ryzen.

The Benefits of Chiplets Can’t Be Separated From the Difficulties Driving Their Use

While AMD has been the most closely associated with chiplets in recent months, they’re far from the only company working on the tech. Intel’s EMIB and Foveros technologies both have potential chiplet applications. This is an area that multiple companies are pushing into because it’s expected to be a path forward that can work for many applications.

We already know that AMD’s third-generation Ryzen will deliver significant improvements in power consumption and overall performance. Clearly, the advantages of breaking CPUs apart to continue moving to smaller process nodes outstrip the advantage of keeping monolithic designs, at least for AMD. Other companies will likely follow suit.

But the adoption of chiplets is also the engineering acknowledgment of constraints that didn’t used to exist. We didn’t need chiplets. When companies like TSMC publicly predict that their 5nm node will deliver much smaller performance and power improvements than previous nodes did, it’s partly an acknowledgment that the improvements engineers have gotten used to delivering from process nodes will now have to be gained in a different fashion. No one is particularly sure how to do this, and analyses of how effectively engineers boost performance without additional transistors to throw at the problem have not been optimistic. Initiatives to examine the performance impact of wafer-scale processing are another example of how engineers are looking for new ways of building chips, or optimizing them post-manufacture, in order to deliver the performance improvements we once gained from node shrinks.

We’re going to be talking a lot about chiplets once third-generation Ryzen arrives and we have an opportunity to drill down into how AMD has adopted this technology and what the benefits are. Later chips will undoubtedly give us a more expanded view of the tradeoffs and benefits. But as far as how to think about chiplets: I’d call them a smart adaptation to a fundamental problem. They are not made from fairy dust or unicorn horns. They do not magically re-enable the sort of CPU scaling we used to see years ago. They do not automatically or intrinsically offer better performance — it’s possible for them to do this, but it’s not a given — and the advantages they do convey in terms of yields and cost should be viewed as a response to skyrocketing node pricing and general yield difficulties that take longer to sort out than they once did.

It’s important to keep the state of the larger ecosystem in mind when evaluating what to expect as far as chiplet-derived improvements. The industry collectively invented chiplets because it needed them to continue offering improvements from one generation to the next, even if that meant throwing decades of design orthodoxy out the window. They’re both an encouraging demonstration that we have continued to find solutions to scaling problems and a reminder that the laws of physics are tightening around us, creating the requirement for such solutions in the first place.

Continue reading

Google, Seagate AI Identifies Problem Hard Drives Before They Fail
Google, Seagate AI Identifies Problem Hard Drives Before They Fail

Google and Seagate have built an AI model to track which hard drives are more likely to fail than others, ideally before any of them have failed at all.

Nvidia RTX 3070 Ti: Mixed Reviews, Low VRAM a Long-Term Problem
Nvidia RTX 3070 Ti: Mixed Reviews, Low VRAM a Long-Term Problem

Nvidia's new RTX 3070 Ti is finally available, but this is a rare misfire for Ampere compared to the other products theoretically available in an alternate dimension that is not here.

Steam Deck Faces Compatibility Problems With Major Titles
Steam Deck Faces Compatibility Problems With Major Titles

The Steam Deck is gathering a lot of attention, but game compatibility could be an issue when it launches.

Amazon Hit by Drone Crash Problems
Amazon Hit by Drone Crash Problems

Amazon’s drone delivery program has reportedly been rife with technical issues, high employee turnover, and safety concerns.