Nvidia’s Titan V Accused of Returning Wrong Answers in Simulations

Nvidia’s Titan V Accused of Returning Wrong Answers in Simulations

Nvidia has long held the pole position in GPGPU computing, particularly in scientific and HPC applications. The company’s long-term investment into CUDA and high performance computing have won it a number of spots in the supercomputing TOP500 and fueled the growth of its Tesla product line, including GPUs like the $3,000 Titan V, a Volta-based graphics card that straddles the line between a consumer and a scientific product. But all may not be well with the Titan V — there are reports that the chip can produce different results from run to run.

That’s the word from The Register, which writes:

One engineer told The Register that when he tried to run identical simulations of an interaction between a protein and enzyme on Nvidia’s Titan V cards, the results varied. After repeated tests on four of the top-of-the-line GPUs, he found two gave numerical errors about 10 per cent of the time. These tests should produce the same output values each time again and again. On previous generations of Nvidia hardware, that generally was the case. On the Titan V, not so, we’re told.

The Reg goes on to note that it also spoke to an “industry veteran,” who speculated that the problem may be due to issues with HBM memory. That same individual noted that this could be due to problems with the GPU’s onboard RAM, and that Nvidia had encountered this kind of issue before and been forced to issue patches to address it.

Nvidia’s Titan V Accused of Returning Wrong Answers in Simulations

Elsewhere, other communities have noted that the problem could be overblown. Floating point parallel computing is not necessarily deterministic, which is to say it does not automatically yield identical results every single time. If the order of operations is different from run to run, the final result could also be different.

It seems unlikely, however, that scientists and researchers would mistake a known issue (non-deterministic output in parallel FP calculations) for a significant hardware issue. The Reg’s source indicated the Titan V could give incorrect results about 10 percent of the time, but did not provide details on which applications were affected, whether the frequency of the problem varied from application to application, or if it could be impacted by changing various GPU settings.

Right now, what we have are more questions than answers. The problem, if it exists, might be addressable via driver or a code change. It might also reflect a problem with the GPU’s memory subsystem, as The Reg speculates. Some HPC applications have updated their own websites to indicate they are aware of the potential issue and haven’t seen it yet. It’s also possible that the issue is limited to a handful of cards and not indicative of a general problem.

As for Nvidia, the company has told the Reg it is aware of the issue and has invited anyone affected to contact Nvidia itself. The Titan V isn’t really positioned as a gaming GPU, but games do not appear to be impacted or affected at this time.

Continue reading

Report: Nvidia’s Next-Gen GPU Could Pack 18,432 CUDA Cores, 64TFLOPS
Report: Nvidia’s Next-Gen GPU Could Pack 18,432 CUDA Cores, 64TFLOPS

Nvidia's upcoming Ampere, which may be named after computing pioneer Ada Lovelace, could pack as many as 18,432 CUDA cores, a substantial increase over current top-end cards like the RTX 3090.

Nvidia’s Purchase of ARM Is Being Investigated by UK Watchdog
Nvidia’s Purchase of ARM Is Being Investigated by UK Watchdog

The ARM - Nvidia deal is receiving close scrutiny from UK regulators who want to make sure the deal won't break the underlying business model that has made ARM a worldwide force to be reckoned with.

Nvidia’s RTX 3060 Picks up a 10 Percent FPS Boost
Nvidia’s RTX 3060 Picks up a 10 Percent FPS Boost

Nvidia is bringing Resizable BAR support to the rest of the RTX family after introducing it with the RTX 3060 last week. Expect performance gains of up to 10 percent.

Nvidia’s RTX 3000 Prices Have Gone From Bad to Brutal
Nvidia’s RTX 3000 Prices Have Gone From Bad to Brutal

Nvidia GPU prices got much worse in February. Here's hoping March brings improvement.