MLCommons Expands MLPerf AI Benchmarks with New Testing for Language Models and Storage Systems

MLCommons, a vendor-neutral, multi-stakeholder organization, is enhancing its MLPerf AI benchmarks by introducing testing for large language models (LLMs) for inference and a new benchmark to measure the performance of storage systems for machine learning workloads. These additions aim to provide a level playing field for vendors to report on different aspects of AI performance and highlight the continuous improvement in performance for vendors with each update.

The newly released MLPerf Inference 3.1 benchmarks mark the second major update of the results this year, following the 3.0 results in April. This update includes an extensive dataset with over 13,500 performance results. Various companies, including ASUSTeK, Azure, Dell, Google, Intel, Lenovo, Nvidia, Oracle, Qualcomm, and more, have submitted their results, demonstrating the industry-wide adoption and engagement with the MLPerf benchmarks.

One common trend observed across the MLPerf benchmarks is the consistent improvement in performance for vendors. According to MLCommons founder and executive director David Kanter, many submitters have showcased performance gains of 20% or more compared to the previous benchmark release. This demonstrates the rapid advancements in AI technologies and the commitment of vendors to push the boundaries of performance.

MLPerf is continuously evolving its benchmark suite to align with the latest developments in the field of AI. To reflect the growing significance of generative AI large language models, MLPerf introduced the LLM benchmark in the 3.1 release. This benchmarks the performance of LLMs in inference tasks, which involve generating multiple sentences. Previously, MLPerf had introduced LLMs in the Training benchmarks but acknowledged the fundamental differences between training and inference tasks.

The MLPerf Training benchmark for LLMs focused on large foundation models, such as the GPT-J 6B parameter model, used for text summarization on the CNN/Daily Mail dataset. However, the inference benchmark aims to address a broader set of use cases applicable to a wider range of organizations. MLPerf’s inference benchmark specifically focuses on text summarization tasks, catering to organizations that may not have access to substantial computational resources or extensive datasets required for training large models.

While high-end GPU accelerators often dominate the MLPerf benchmark listings for training and inference, Intel highlights the importance of considering the diverse needs of organizations. Intel’s silicon, including Habana Gaudi accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series processors, performed well in the MLPerf Inference 3.1 benchmarks. Speaking during the MLCommons press briefing, Intel’s senior director of AI products Jordan Plawner emphasized the importance of deploying AI in production using different types of compute, showcasing the significance of both software and hardware solutions.

Nvidia’s GPUs also play a significant role in the MLPerf Inference 3.1 benchmarks. The inclusion of Nvidia’s GH200 Grace Hopper Superchip in the benchmarks is a notable addition. The Grace Hopper superchip combines an Nvidia CPU with a GPU to optimize AI workloads, delivering up to 17% more performance compared to the H100 GPU submissions. Furthermore, Nvidia’s L4 GPUs exhibited exceptional performance, surpassing the best x86 CPUs by up to 6x. These results demonstrate Nvidia’s commitment to offering high-performance solutions for AI inference tasks.

MLCommons’ expansion of the MLPerf AI benchmarks with LLM inference testing and storage system performance benchmarks signifies the organization’s dedication to fostering a fair competitive environment for AI performance reporting. The MLPerf Inference 3.1 benchmarks, with their extensive dataset, showcase the continuous improvement in performance across different vendors. The addition of the LLM benchmark highlights the growing importance of generative AI models, while also addressing broader use cases that organizations face. Through the contributions of various companies, including Intel and Nvidia, MLPerf continues to push the boundaries of AI performance and foster innovation in the field.

Articles You May Like

Leave a Reply Cancel reply