Tachyum Raises $25M for Universal Processor ‘Faster Than Xeon, Smaller Than ARM’

Tachyum Raises $25M for Universal Processor ‘Faster Than Xeon, Smaller Than ARM’

The start-up company Tachyum has raised $25M in a Series-A funding round for a new processor design it calls the Prodigy Universal Processor. Prodigy is supposedly faster in single-threaded code than Xeon, with smaller CPU cores than ARM. It can be used to simulate human brain-sized neural networks in real time. It outperforms CPUs, GPUs, and Google’s TPU. It can run 64 cores at an all-core frequency of 4GHz, fits into just 290mm2 of die space (half the size of AMD’s 7nm Epyc design on the same node), supports eight channels of DDR5, 72 PCIe 5.0 lanes, 2x 400G Ethernet connections, and has support for HBM3.

Tachyum Raises $25M for Universal Processor ‘Faster Than Xeon, Smaller Than ARM’

To say that Tachyum hasn’t proved these claims would be an understatement. Claiming to be able to beat Intel or AMD in single-threaded performance or ARM on die size and power efficiency would be eyebrow-raising in the best of circumstances. Claiming to do both simultaneously with a chip you haven’t actually even built yet requires better evidence than we’ve yet seen to take the argument seriously. The company is claiming it’ll eventually field a CPU with 128 cores at 4GHz in a single socket with 12x DDR5 controllers.

The company gave a presentation at Hot Chips last year that’s now public; we’ve reproduced some of its slides in the slideshow below.

Tachyum’s PR copy claims that Prodigy reduces data center TCO by 4x “through a disruptive hardware architecture and a smart compiler that has made many parts of the hardware found in a typical processor redundant. Fewer wires and shorter wires, due to a smaller, simpler core, translates into much greater speed and power efficiency for the processor.”

According to the Q&A session after Hot Chips, these CPUs lose 40 percent of performance when running native x86 code, which seems like a major problem for the whole “Faster than Xeon” argument. The company claims that “Binary 4.0 GHz emulated still outperforms 2.5 GHz Xeon,” which would be more of a problem for Intel (or AMD) if a 2.5GHz Xeon represented some kind of objectively difficult performance threshold. Phrases like “Out of execution in software” is a fancy way of saying: “We shoved all the work of achieving high performance into the compiler, and we’re really hoping our compiler can extract enough performance to make this work.” Intel tried exactly this strategy with Itanium. It didn’t work.

With that said, there’s a lot about Prodigy’s architecture that’s unclear right now. There are arguments in various forums about the degree to which it resembles or doesn’t resemble Itanium or whether its architecture should be more properly understood as VLIW, modified VLIW, EDGE, or something else.

Tachyum’s Prodigy, based on what we’ve seen to date, is very long on sizzle. It’s supposedly the best parallel processor and the best serial processor, despite the fact that CPUs and GPUs run very different types of code. It can match or exceed Intel’s top-end chips, yet runs within power envelopes and die sizes better than anything ARM or AMD can field.

Extraordinary claims require extraordinary evidence. We don’t have much of that just yet.

Continue reading

AMD Adjusts Ryzen Master ‘Fastest Core’ Rating to Match Windows 10
AMD Adjusts Ryzen Master ‘Fastest Core’ Rating to Match Windows 10

AMD is making some changes to Ryzen Master that will impact which cores it identifies as the fastest in a system.