New Utility Can Double AMD Threadripper 2990WX Performance
AMD’s 32-core 2990WX Threadripper CPU has always been a bit of an uncertain proposition. While undeniably fast in certain scenarios, the chip has marked performance regressions in other tests, and doesn’t always outperform the 16-core Threadripper 2950X. Now, there’s a utility, CorePrio, that can be used to restore much of the 2990WX’s missing performance under Windows 10.
Level1Techs has published an extensive report into their investigation of performance on the 2990WX. The initial assumption that memory bandwidth congestion is responsible for lower overall performance, while not wrong in all cases, has been proven incomplete. Level1 found that the same performance regressions were present in an Epyc 7551 they tested, which had eight memory channels instead of Threadripper’s four. Again, performance under Linux was fine, but performance in Windows was impacted. But Level1 also found strange behavior associated with changing Windows CPU affinities, and how this impacted overall performance testing.
What their investigation ultimately revealed is problems with how certain applications move workloads between cores in NUMA-enabled CPUs with more than one NUMA node. Level1 writes: “When only one NUMA node is recommended via the ‘ideal CPU’ the windows kernel seems to spend half the available CPU time just shuffling threads between cores.”
They continue:
Here’s an interesting twist: If you only have one OTHER NUMA node – windows seems to fall back to allowing the threads to establish themselves on the second NUMA node… This is most likely related to a bugfix from Microsoft for 1 or 2 socket Extreme Core Count (XCC) Xeons wherein a physical Xeon CPU has two numa nodes. In the past (with Xeon V4 and maybe V3), one of these NUMA nodes has no access to I/O devices (but does have access to memory through the ring bus).
If that’s true, then that work-around to make sure this type of process stays on the “ideal CPU” in the same socket has no idea what to do when there is more than one other NUMA node in the same package to “fail over” to.
The solution to this is a utility named CorePrio:
CorePrio solves this problem and allows for threads to be scheduled evenly across the CPUs rather than Windows spending all of its time trying to shuffle them across the die. It looks as though the reason for sharp performance regressions with the 2990WX was caused at least in part by Windows spending far more time moving workloads from CPU to CPU than it ever spent actually executing work. Obviously, this won’t boost Threadripper’s performance in applications where it already scaled well, but it should fix the performance regressions in multiple applications.
It’s not clear if the memory subsystem is still implicated in this yet. If threads are being misallocated on the wrong NUMA node, it’s possible that memory accesses are being run mostly or entirely through a single memory controller. This would explain why an eight-channel Epyc in NUMA mode gives the same performance (with allowance for clock speed) as a four-channel TR. And there may well be applications that don’t scale well in the 2990WX’s NUMA configuration for reasons unrelated to any shortcomings in the Windows 10 scheduler.
The full scope of the bug and its potential fixes haven’t been fully fleshed out yet, if the “fixes unknown Windows perf issue” wasn’t a clue above. Microsoft and AMD have not yet issued formal responses and it’s not clear what the timeline is for fixing this problem via OS update. But if you’re a 2990WX owner or were interested in becoming one, this could change the calculus on whether the chip is worth investing in — provided you’re a very particular kind of customer in the first place, obviously. Average and even not-so-average gamers need not apply, as chips like the 2990WX play in very rarified space to start with.
Continue reading
MSI’s Nvidia RTX 3070 Gaming X Trio Review: 2080 Ti Performance, Pascal Pricing
Nvidia's new RTX 3070 is a fabulous GPU at a good price, and the MSI RTX 3070 Gaming X Trio shows it off well.
RISC-V Tiptoes Towards Mainstream With SiFive Dev Board, High-Performance CPU
RISC V continues to make inroads across the market, this time with a cheaper and more fully-featured test motherboard.
Ryzen 9 5950X and 5900X Review: AMD Unleashes Zen 3 Against Intel’s Last Performance Bastions
AMD continues its onslaught on what was once Intel's undisputed turf.
Intel Is Spreading FUD About Supposedly Huge Ryzen 4000 Performance Drops on Battery
Intel believes it has presented evidence that negates the value of AMD's Ryzen 4000 product stack. Intel is mistaken.