Hewlett-Packard Enterprise is prepping the most powerful ARM-based supercomputer ever built for installation at Sandia National Laboratories later this year. The system, codenamed Astra, will be built around HPE’s Apollo 70, with 2,592 dual-socket nodes and a total of 145,152 cores.
But unlike most supercomputers today, including virtually the entire TOP500 list, Astra isn’t built around the x86 architecture. It shares that distinction with just 24 other systems on the existing TOP500, with the other 476 systems based on Intel or AMD hardware. Astra will pack 5,184 ARM CPUs — each a 28-core Thunder X2 CPU built by Cavium. Each CPU is clocked at 2GHz, for a total of 2.3 peak petaflops in theory, a rank which would put the new system roughly around #87 on the latest TOP500 list released in November 2017. By the time Astra is fully online new lists will have been published, which is probably why the press materials mention the more generic “Top 100” of the TOP500 rather than attempting to state where, exactly, the new system will rank.
Astra will be “hands down be the world’s largest ARM-based supercomputer ever built,” Mike Vildibill, VP of Advanced Technologies Group at HPE, told ZDNet. “The government views ARM as one of several microprocessors that are important for achieving exascale in the future.”
Keeping workloads local and moving data minimally are key goals in the push to build exascale-class computing hardware, given the tremendous energy cost of moving data from Point A to B. “We can see clearly that the amount of power required to move data inside the system is an order of magnitude greater than the amount of power needed to compute that data,” explained Vildibill.
In recent years, the US government has doled out $258M towards exascale computing in grants to a number of companies, including AMD, Intel, HPE, Cray, IBM, and Nvidia. Overall power consumption will be 1.2MW, an amount the TOP500 press release characterizes as “respectable energy efficiency.” The system will be backed up by 350TB of storage and is intended to search for new methods of managing America’s aging nuclear arsenal. The installation is also a kind of test run for ARM hardware in these kinds of situations overall, and to examine its performance characteristics in scenarios that ARM hasn’t historically been used to test.
Sandia’s calculations are reportedly bandwidth-limited, and some of Cavium’s Thunder X2 CPUs have up to eight DDR4 memory controllers, capable of providing up to 170GB/s of bandwidth per socket. HP is making this push as part of its overall Memory Driven Computing architecture approach, in which the company is emphasizing the amount of memory channels and total connected RAM it can offer with its servers. In this case, you don’t technically need to go with ARM to maximize per-socket memory bandwidth — AMD also uses up to eight channels in its Epyc line of CPUs — but the move to deploy a test ARM system is also important for validating that ARM hardware and servers are up to the challenge of serving in this environment.