What is Speculative Execution?

What is Speculative Execution?

As discussion of the Spectre and Meltdown bugs dominates the tech news cycle, there’s been repeated reference to a specific feature of high-end CPUs: speculative execution. It’s a key capability of higher-end ARM products, Apple’s custom ARM cores, IBM’s POWER family, and the vast majority of the x86 processors produced by Intel and AMD. Here’s what speculative execution is and how it relates to other key capabilities of modern microprocessors, and how the recent Meltdown bug targets Intel CPUs in particular.

What is Speculative Execution?

Speculative execution is a technique CPU designers use to improve CPU performance. It’s one of three components of out-of-order execution, also known as dynamic execution. Along with multiple branch prediction (used to predict the instructions most likely to be needed in the near future) and dataflow analysis (used to align instructions for optimal execution, as opposed to executing them in the order they came in), speculative execution delivered a dramatic performance improvement over previous Intel processors.

Here’s how it works. Modern CPUs are all pipelined, which means they’re capable of executing multiple instructions in parallel, as shown in the diagram below.

Image by Wikipedia. This is a general diagram of a pipelined CPU, showing how instructions move through the processor from clock cycle to clock cycle.
Image by Wikipedia. This is a general diagram of a pipelined CPU, showing how instructions move through the processor from clock cycle to clock cycle.

Imagine that the green block represents an if-then-else branch. The branch predictor calculates which branch is more likely to be taken, fetches the next set of instructions associated with that branch, and begins speculatively executing them before it knows which of the two code branches it’ll be using. In the diagram above, these speculative instructions are represented as the purple box. If the branch predictor guessed correctly, then the next set of instructions the CPU needed are lined up and ready to go, with no pipeline stall or execution delay.

Without branch prediction and speculative execution, the CPU doesn’t know which branch it will take until the first instruction in the pipeline (the green box) finishes executing and moves to Stage 4. Instead of having moving straight from one set of instructions to the next, the CPU has to wait for the appropriate instructions to arrive. This hurts system performance, since it’s time the CPU could be performing useful work.

The reason its “speculative” execution, of course, is because the CPU might be wrong. If it is, the system loads the appropriate data and executes those instructions instead. But branch predictors aren’t wrong very often; accuracy rates are typically above 95 percent.

Why Use Speculative Execution?

Decades ago, before out-of-order execution was invented, CPUs were what we today call “in order” designs. Instructions executed in the order they were received, with no attempt to reorder them or execute them more efficiently. One of the major problems with in-order execution is that a pipeline stall stops the entire CPU until the issue is resolved.

The other problem that drove the development of speculative execution was the gap between CPU and main memory speeds. The graph below shows the gap between CPU and memory clocks. As the gap grew, the amount of time the CPU spent waiting on main memory to deliver information grew as well. Features like L1, L2, and L3 caches and speculative execution were designed to keep the CPU busy and minimize the time it spent idling.

If memory could match the performance of the CPU there’d be no need for caches.
If memory could match the performance of the CPU there’d be no need for caches.

It worked. The combination of large off-die caches and out-of-order execution gave Intel’s Pentium Pro and Pentium II opportunities to stretch their legs in ways previous chips couldn’t match. This graph from an Anandtech article shows the advantage clearly.

What is Speculative Execution?

Ultimately, it was the Pentium II that delivered the benefits of out-of-order execution to most consumers. The Pentium II was a fast microprocessor relative to the Pentium systems that had been top-end just a short while before. AMD was an absolutely capable second-tier option — my primary PC all through college was an AMD K6-233, which became a K6-2 400, which got a new motherboard with support for the K6-2+ and became an overclocked K6-2+ 550. But until the original Athlon launched, Intel could make a very honest claim to the overarching performance crown.

The Pentium Pro and the later Pentium II were far faster than the earlier architectures Intel used. This wasn’t guaranteed. When Intel designed the Pentium Pro (the first CPU to use speculative execution) it spent a significant amount of its die and power budget to bring the chip to market. But the bet paid off, big time.

There are differences between how Intel, AMD, and ARM implement speculative execution, and those differences are part of why Intel is exposed on Meltdown in ways that the other vendors aren’t. But speculative execution, as a technique, is simply far too valuable to stop using. Every single high-end CPU architecture today — AMD, ARM, IBM, Intel, SPARC — uses out-of-order execution. And speculative execution, while implemented differently from company to company, is used by each of them.

Why is Meltdown Such a Problem for Intel?

The reason Meltdown causes such unique headaches for Intel is because Intel allows speculative execution to access privileged memory a user-space application would never be allowed to touch. Here’s how MarkCC of Goodmath.org describes the problem:

Code that’s running under speculative execution doesn’t do the check whether or not memory accesses from cache are accessing privileged memory. It starts running the instructions without the privilege check, and when it’s time to commit to whether or not the speculative execution should be continued, the check will occur. But during that window, you’ve got the opportunity to run a batch of instructions against the cache without privilege checks. So you can write code with the right sequence of branch instructions to get branch prediction to work the way you want it to; and then you can use that to read memory that you shouldn’t be able to read.

The speculative prediction implementations of other CPU vendors don’t allow user-space applications to probe the contents of kernel space memory at any point. The only way to mitigate Meltdown in software is to force the system to perform a full context switch every time it switches between kernel and user memory space. The reason the performance impact from Meltdown is so varied is because how much this patch hurts is a function of how often an application has to context switch.

The Solution to These Problems is Going to Whack Some Folks

We’ve focused on Meltdown here, because that’s the flaw wrapped around speculative execution, but there are Windows patches rolling out for both Spectre variants and Meltdown — and the performance hit associated from Spectre mitigation seems like it’ll hit some systems pretty hard.

According to a Microsoft blog post, Windows 10 users with Skylake, Kaby Lake, or Coffee Lake should see performance declines in single digits. Users with Haswell or earlier CPUs using Windows 10 aren’t so lucky. MS reports that “some benchmarks show more significant slowdowns, and we expect that some users will notice a decrease in system performance.”

If you’re running Windows 7 or Windows 8, you’re going to get hit harder. Microsoft writes that it expects most users to experience a decrease in performance. It is not yet clear if Intel and AMD take a different performance hit for mitigating Spectre performance (AMD is still unimpacted by Meltdown). And we know that Meltdown isn’t a minor impact on server utilization and some benchmarks in Linux.

This situation is still evolving, but hopefully we’ve cleared up at least one part of the problem. If you’re using a Haswell or earlier system and wondering how much performance you’re going to be asked to give up, for what it’s worth, I’m right there with you. Suddenly my Ivy Bridge CPU looks like it might need replacing if the performance degradation is bad enough.

Check out our wfoojjaec Explains series for more in-depth coverage of today’s hottest tech topics.

Continue reading

Scientists Find Planet Where It Rains Molten Rock
Scientists Find Planet Where It Rains Molten Rock

The ground is rock, the seas are rock, and yes, even the air is rock.

What Does It Mean for the PC Market If Apple Makes the Fastest CPU?
What Does It Mean for the PC Market If Apple Makes the Fastest CPU?

Apple's M1 SoC could have a profound impact on the PC market. After 25 years, x86 may no longer be the highest-performing CPU architecture you can practically buy.

Which Is Faster, the Xbox Series X or PlayStation 5? Early Data Says It’s Complicated
Which Is Faster, the Xbox Series X or PlayStation 5? Early Data Says It’s Complicated

Competitive head-to-head data on the Xbox Series X versus the PlayStation 5 is beginning to trickle out.

Google Kills Free Photo Storage, Changes What Counts Toward Storage Caps
Google Kills Free Photo Storage, Changes What Counts Toward Storage Caps

Google has announced some significant changes to Photos, especially if you use the service for automatic backup.