How to Bypass Matlab’s ‘Cripple AMD CPU’ Function

How to Bypass Matlab’s ‘Cripple AMD CPU’ Function

One of the difficulties of CPU reviews is that they represent the best time to evaluate new features and software — while simultaneously representing the worst possible time to attempt to do a deep dive on any specific piece of software. Sometimes, reviewers adopt tests because a vendor has recommended them, without considering whether the test will perform identically on an Intel versus an AMD system. Sometimes, the vendor fails to disclose that an application is compiled in a manner that will lead to tests running much faster on one platform as opposed to another. This is one of those times.

When I published Matlab data in our Threadripper 3970X / Cascade Lake X joint review, it was because Intel had recommended this test and workload as a showcase for Intel’s HEDT desktop line. I specifically asked for recommendations, hoping that Intel would have some applications in mind that would show relatively light scaling at or above the 18-core mark with AVX-512 integration. Even professional apps don’t scale perfectly forever, and I knew going into this review that there was going to be a performance “island” for Intel to stand on at the intersection of higher clocks and lightly threaded applications. “Lightly,” in this context, should be understood to mean “apps that don’t scale all the way to 64 threads” as opposed to “apps that don’t scale past 4-8 threads,” which is usually what we mean when we call an app lightly threaded. It was obvious that Threadripper 3960X and 3970X were going to beat the 10980XE in every app that could scale to match their thread counts, especially in the 3970X’s case. With that as a given, it was worth exploring the areas that had historically been the strongest for Intel to see how performance would compare.

Intel recommended four workloads for this review: AIXPRT, Adobe Premiere Pro, Matlab, and Sony Catalyst. I wanted to spend more time evaluating AIXPRT before I started running it on systems, which made it less appealing. Adobe now requires that you provide them with a credit card in order to launch a 7-day free trial of their software, so that’s right out. I opted to test Matlab and Sony Catalyst. I was not aware of this investigation and report by redditor Nedflanders1976, made some eight days ago.

He writes:

Matlab runs notoriously slow on AMD CPUs for operations that use the Intel Math Kernel Library (MKL). This is because the Intel MKL uses a discriminative CPU Dispatcher that does not use efficient codepath according to SIMD support by the CPU, but based on the result of a vendor string query. If the CPU is from AMD, the MKL does not use SSE3-SSE4 or AVX1/2 extensions but falls back to SSE1 no matter whether the AMD CPU supports more efficient SIMD extensions like AVX2 or not.

There’s a way to disable this behavior in Matlab. Flanders writes. If you are a Windows user with Matlab installed, create a batch file with the following data:

@echo offset MKL_DEBUG_CPU_TYPE=5matlab.exe

Start the application using this batch file. You can make this permanent by entering: “MKL_DEBUG_CPU_TYPE=5” into the System Environment Variables. Nedflanders1976 also has details on how to perform this task for Linux. We played around with testing some variant ideas, including setting “MKL_DYNAMIC=FALSE” and “MKL_NUM_THREADS=64” to see if these settings would improve performance. They did not. Best performance was obtained using the settings above.

Updated Matlab Results

I have updated our Matlab results with new data, showing the impact of running the application in this mode. I display the total summary time for the entire workload at the bottom of each set of results. The top results show the performance of our three compared CPUs without any changes, the bottom chart shows the impact with the “set MKL_Debug_CPU_Type=5” flag. This may work for other applications that use the MKL library as well. It should be noted that in many cases, the CPU is only ~53-55 percent loaded during this test — a load level that correlates to 17-18 processor threads. In this case, however, these settings proved faster than forcing the MKL to use a higher number of threads. Telling the machine to use 48 or 64 threads only increased total execution time on the 3970X.

Click to enlarge
Click to enlarge

AMD’s performance improves by 1.32x – 1.37x overall. Individual test gains are sometimes much larger. Obviously these results are much worse for Intel, changing what looked like a narrow victory over the 3960X and a good showing against the 3970X into an all-out loss.

When Is It Alright to Test These Sorts of Applications?

I was not aware of Matlab’s behavior when I agreed to benchmark the application for the Threadripper 3960X / 3970X / Cascade Lake launch, but this is an excellent time to discuss the topic. The fact is, Matlab ships in a configuration that is automatically biased against AMD: It refuses to run SIMD code on an AMD CPU, even though the CPU supports the SIMD code in question.

It is not wrong to benchmark a real-world application. The performance of a real-world application you have to use may well be relevant to the hardware choices you make. If your job depends on running workloads in an application that heavily favors Intel microprocessors, you’re likely to buy Intel CPUs, even if you’d prefer to buy chips based on ARM or AMD designs. People deserve to know how the software that they run actually performs on the hardware that they use, and Matlab is a major piece of software used by more than 3 million people. The fact that Matlab favors Intel CPUs doesn’t mean Matlab users don’t deserve to know how the application performs. Of course they do.

Flanders asks that you contact Matlab to make a feature request if you want a solution to this problem, but you’ll need to be a Matlab subscriber already to submit anything. Regardless of whether the company changes its approach, we feel end-users need to be aware of how to bypass this issue and restore full performance to AMD CPUs.

I might have included the Matlab test even if Intel had disclosed that it would use Intel-specific optimization paths; I was looking for tests that favored Intel to compare against AMD’s much-higher core count. I had no intention of positioning these tests as anything other than what they were — best-case scenarios for Intel, but realistic scenarios nonetheless. I tested y-cruncher 0.78 for this review specifically because it’s an example of an AVX-512 optimized application where that SIMD set gives Intel a significant speed boost. I don’t have a problem with showcasing Intel or AMD CPUs to their best advantage. I just want to know when I’m doing it.

Readers will ask why I haven’t jumped down Intel’s throat based on the historical facts of the “Cripple AMD” compiler issue. Let me reassure you, I’m fully aware of them. The facts of that situation are why Intel should have made certain to be careful of which tests were recommended and what it communicated concerning those tests. But I also don’t have copies of every scrap of Intel guidance on the last few HEDT launch cycles, and can’t say the test was added for this cycle rather than being a program Intel would’ve also recommended in previous cycles as well. Even if Matlab wasn’t mentioned in previous public benchmark guidance for earlier CPUs, I had specifically reached out to Intel to request data on tests that might incorporate capabilities like AVX-512. Intel should have practices in place to make sure reviewers know about pitfalls like this — but I can’t say for certain that this wasn’t a mistake.

With that said:

Readers should be aware that I also scanned Sony Catalyst using the Swallowtail patches capable of removing the “Cripple AMD” function from executables and found no sign of any pro-Intel bias after they had executed; the Threadripper 3970X executed the Sony Catalyst workload in the same period of time after the application had run. At this point, however, the Swallowtail patches are 10 years old. While I’ve confirmed that they work on old software, it’s not clear that they are capable of detecting the methods still being used to prevent code from executing optimally on AMD processors. I am no longer certain if Sony Catalyst Edit represents the kind of lightly-threaded app I was hoping to test for the 10980XE review, or if it uses as-yet undetected preferential code paths to improve performance on Intel CPUs. At the very least, I’m not as certain as I’d like to be.

I will say one more thing on this issue. As far as I’m personally concerned, any piece of software that claims to support AVX, AVX2, SSE, or any other SIMD code should prominently state whether that code executes solely on supported Intel microprocessors. Failing to inform customers that your software won’t execute ideal code on their platform due to hard-coded limits in your application ought to constitute false advertising. AMD advertises its CPUs based on factors like AVX2 support, but software vendors are under no obligation to inform you whether you’ll be able to use features you literally paid for. This should not be the case. Multi-million dollar software developers are capable of performing the due diligence required to be honest about their optimization practices.

Continue reading

How to Bypass Matlab’s ‘Cripple AMD’ Function, Restore Full Performance to Ryzen, Threadripper CPUs
How to Bypass Matlab’s ‘Cripple AMD’ Function, Restore Full Performance to Ryzen, Threadripper CPUs

If you run Matlab on an AMD processor, you aren't getting all the performance you're entitled to. Matlab refuses to execute certain code on AMD CPUs, even when those CPUs support the code in question. Here's how to fix that.