The technology of artificial intelligence has become so prevalent in even the most complex domains of science that it now has its own suite of tests to measure its computing time on the world’s most powerful computers.
MLPerf, the industry consortium that serves the computer industry by measuring how long it takes to run machine learning, a subset of artificial intelligence, on Wednesday offered an inaugural suite of test results for high-performance computing, or HPC, systems running the machine learning tasks.
The test results, submitted by a variety of research labs, include results for the world’s fastest computer, Fugaku.
The effort reflects the fact that supercomputers are more and more incorporating deep learning forms of AI into their calculation of traditional scientific problems.
“We saw an omission in that we didn’t have more scientifically-oriented workloads at a time when people are beginning to look at training as potentially an HPC workload, or coupled to, or a component of them,” said David Kanter, the head of MLPerf, in a briefing with reporters.
The new results join two existing benchmark test results, one that measures training on ordinary server systems, and one that measures machine learning inference, making predictions, on servers and on mobile devices.
Also: Nvidia makes a clean sweep of MLPerf predictions benchmark for artificial intelligence
The MLPerf staff that designed the tests are hosting a session Wednesday afternoon to discuss the effort at SC20, a supercomputing conference usually held in San Diego every year, and which this time is being held as a virtual event given the COVID-19 pandemic.
The tests specifically measure how many minutes the supercomputers take to train a deep learning network until it reaches proficiency in two tasks, called CosmoFlow and DeepCAM.
CosmoFlow, a collaboration between Intel and Hewlett Packard Enterprise’s Cray unit, and Department of Energy’s National Energy Research Scientific Computing Center, or NERSC, uses a three-dimensional convolutional neural network to determine the cosmological parameters of the universe.
DeepCAM, a collaboration between Nvidia, Lawrence Berkeley National Laboratory, and Oak Ridge National Laboratory, is an image segmentation neural network application that learns to predict “extreme” weather phenomena.
Some of the impetus for the new tests comes from laboratories that wanted to use benchmarks to spec out all the technology sold by vendors of supercomputing equipment, including chip makers Intel and Nvidia and Advanced Micro Devices.
“We had been approached by a couple of supercomputing centers that were interested in using MLPerf training for bids, qualification and acceptance,” said Kanter. “Over a billion dollars of bids have used MLPerf components in the bidding process.”
Also: Nvidia and Google claim bragging rights in MLPerf benchmarks as AI computers get bigger and bigger
Systems that took part include some of the fastest in the world, as measured by the Top 500 list of supercomputers. They include the Fugaku system at the RIKEN Center for Computational Science in Kobe, Japan, which was built by Fujitsu, and which is number one on the Top 500 list. Another entrant was Frontera-RTX, at the Texas Advanced Computing Center at the University of Texas, number nine on the list.
The benchmarks had to accommodate some changes to the way things are measured.
Unlike with MLPerf training on server computers, where the usual metric is the number of images that can be processed per second, where more is better, the supercomputer tasks count wall time to reach accuracy, where less is better.
The least amount of time to develop a neural network to solve the CosmoFlow problem was thirteen minutes, achieved by the AI Bridging Cloud Infrastructure Computer, or ABCI, as it’s called, housed at the National Institute of Advanced Industrial Science and Technology in Japan. The machine, the 14th most-powerful in the world, was also developed by Fujitsu and features a combination of 1,024 Intel Xeon microprocessors and 2,048 Nvidia V100 GPUs.
A second ABCI system, using half the compute power, achieved the lowest time to train the image segmentation task on DeepCAM, taking just ten-and-a-half minutes.
But there are other, more-specific changes pertaining to the nature of supercomputers and their work.
For example, input-output has to be measured more carefully in supercomputer benchmarks because it can have a greater influence on results. Unlike tests on typical MLPerf tasks such as ImageNet, “we have large scientific data sets, complex data structure, spatial-temporal data sets that may come from large HPC simulations,” said Steve Farrell, a machine learning engineer with NERSC who is a co-chair for the HPC effort for MLPerf.
“Data movement, that part of the story, is very important for HPC,” said Farrell. CosmoFlow and DeepCAM have data sets measuring five terabytes and nine terabytes, he noted.
“What we added to the rules, is any data movement from a general file system had to be included in the benchmark reported time, and we captured the time spent in the staging process,” he said.