Clever OS Scheduling Partly Explains Apple M1’s Responsiveness

Clever OS Scheduling Partly Explains Apple M1’s Responsiveness

When Apple launched the M1, one of the persistent critiques from end-users was how responsive the CPU felt, even during ordinary desktop usage. Now, a macOS developer has found clues to how Apple pulled off the improvement. It isn’t a matter of boosting CPU performance, at least not exactly. What Apple has done is change how iOS responds to quality-of-service (QoS) metrics and how workloads are scheduled on the chip.

Dr. Howard Oakley is an author, Mac developer, and former Royal Navy Surgeon. He’s recently written about his comparisons between an M1-powered Mac and his Xeon-based Mac Pro, and how differently macOS behaves on the two machines.

Before we dive into his findings, I’d like to toss in a bit of historical context. One of the challenges of testing Hyper-Threading, back when it debuted on the Pentium 4, was the difficulty of measuring exactly how it impacted system behavior. Scott Wasson, then of the Tech Report, coined the term “creamy smoothness” to describe how the P4 behaved under load compared to an HT-less CPU. Even though the AMD Athlon XPs of the day might be faster in a single-threaded workload, Hyper-Threading kept the system responsive.

Fast forward to 2008-2009, and the launch and popularity of Intel’s first-generation Atom. While no Bonnell-powered Atom system had much of a CPU to speak of, netbooks based on Nvidia’s Ion chipset felt like they were in an entirely different class of device. Even though the integrated Nvidia GPU only offloaded the Windows 7 UI, it made Ion feel distinctly up-market compared with the Intel 945 chipset.

We have, therefore, historical background from Windows to demonstrate the impact of proper task offloading and how much of an impact it can have on task responsiveness. In the modern era, macOS allows developers to define different QoS levels. On an x86 CPU, Dr. Oakley’s testing shows that threads execute as quickly as possible at any QoS setting, so long as an application with a higher QoS doesn’t preempt it. In his testing, this worked out to a consistent 5.6 – 6.6 second compression time for a 10GB file. Testing multiple instances of the application simultaneously showed that the version with a higher QoS executed in the same 5.6 – 6.6s window, while the run with a lower QoS took as long as 24 seconds. All of this is more-or-less equivalent to what we’d expect from Windows.

The M1, however, does not behave this way. Here’s Dr. Oakley:

Clever OS Scheduling Partly Explains Apple M1’s Responsiveness

All operations with a QoS of 9 (background) were run exclusively on the four Efficiency (Icestorm) cores, even when that resulted in their being fully loaded and the Performance cores remaining idle. Operations with any higher QoS, from 17 to 33, were run on all eight cores.

Apple, in other words, has changed the way macOS treats the M1 to prioritize responsiveness. Instead of being used to execute background tasks or OS updates, the FireStorm cores are reserved for high-priority applications. If the application demands maximum performance, it can still run across all eight cores, even though this is probably more likely to cause some degree of desktop lag. The system will preferentially run OS tasks in-background, even when this makes them execute much more slowly, in the name of keeping power consumption low.

There’s no specific reason why an x86 CPU couldn’t be run in this fashion. While x86 CPUs are still almost entirely homogeneous, the OS could hypothetically dedicate a specific set of cores to processing background tasks, while reserving the rest for peak performance.

This is, at minimum, a clever way for Apple to improve the end-user experience. Intel is moving to its own hybrid architecture with Alder Lake later this year, and we may see Windows 10 + hybrid x86 CPUs deploy a similar system for minimizing power consumption while maximizing performance.

Continue reading

Apple’s M1 Continues to Impress in Cinebench R23, Affinity Photo
Apple’s M1 Continues to Impress in Cinebench R23, Affinity Photo

New Cinebench R23 benchmarks paint AMD in a more competitive light against the M1, but Apple's SoC still acquits itself impressively. The Affinity Photo benchmark, however, is a major M1 win.

Why Apple’s M1 Chip Threatens Intel and AMD
Why Apple’s M1 Chip Threatens Intel and AMD

Intel's own history suggests it and AMD should take Apple's new M1 SoC very seriously.

Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications
Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications

Nvidia is already capitalizing on its ARM acquisition with a massively powerful new CPU-plus-GPU combination that it claims will speed up the training of large machine-learning models by a factor of 10.

The Xbox Series S Is Handicapped by Its Storage Capacity
The Xbox Series S Is Handicapped by Its Storage Capacity

The Xbox Series S has been favorably received, for the most part, but the console's low base storage makes the Xbox Series X a better value for a lot of people.