Register now for better personalized quote!

MLCommons unveils a new way to evaluate the world's fastest supercomputers

Nov, 17, 2021 Hi-network.com

MLCommons, the open engineering consortium that helps the industry track machine learning performance, is introducing a new metric designed to more accurately assess and compare the performance of the world's fastest supercomputers. The new metric is part of MLPerf HPC v1.0, the latest release of the group's ML training performance benchmark suite for high-performance computing (HPC). 

special feature

How to Implement AI and Machine Learning

The next wave of IT innovation will be powered by artificial intelligence and machine learning. We look at the ways companies can take advantage of it and how to get started.

Read now

MLCommons released the inaugural MLPerf HPC results last year, measuring how quickly different systems could train a neural network. The initial benchmark suite has been used to measure systems that generally use somewhere between 500 to 4,000 processors or accelerators -- quite a bit smaller than the leading supercomputers. 

But while the initial version worked well for many scientifically-oriented workloads, it didn't really scale up to full supercomputing capabilities. For instance, at scale, interconnect begins to matter a lot more.

"It's important to keep in mind that small systems and large systems behave very differently," David Kanter, the head of MLPerf, said in a briefing with reporters. 

Most systems run multiple jobs at the supercomputer scale, such as training ML models -- in parallel. So, in addition to the time-to-train metric, MLCommons added a throughput metric. It measures how many models per minute a system can train -- "a very good proxy for the aggregate machine learning capabilities of a supercomputer," Kanter said. It captures the impact on shared resources, such as the storage system and interconnects. 

Submitters can choose the size and number of instances they test, allowing them to exhibit different supercomputing capabilities. For this release, submitters also had to report their "strong-scaling" results -- the "time to train" metric. 

For this benchmark round, MLCommons received submissions from eight supercomputing organizations, including Argonne National Laboratory, the Swiss National Supercomputing Centre, Fujitsu and Japan's Institute of Physical and Chemical Research (RIKEN), Helmholtz AI (a collaboration of the J

tag-icon Hot Tags : Business Tech Industry

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.