NVIDIA Blackwell dominates MLPerf 6.0 benchmarks - nvidia blackwell
NVIDIA Blackwell dominates MLPerf 6.0 benchmarks

NVIDIA’s Blackwell GPUs have swept every benchmark in the latest MLPerf Training 6.0 results, with no serious competition in sight.

MLCommons released the newest round of its open, peer-reviewed AI performance tests, adding two large-scale and entry-level MoE models: DeepSeek V3 (671B) and GPT-OSS 20B (21B). These models represent the first inclusion of mixture-of-experts architectures in the suite, reflecting the growing adoption of MoE designs for both massive and compact AI workloads. NVIDIA submitted results for all seven benchmarks and set the fastest training times across the board.

Related: Tensordyne Napier chip delivers 13x AI speed boost

In the Llama 3.1 8B test, NVIDIA’s Blackwell GB300 finished in 4.46 minutes. The next-best submission took 58.63 minutes. For Llama 2 70B LoRA, the difference was smaller but still sharp: 0.40 minutes versus 8.27 minutes.

Some tests saw no rival submissions at all. DeepSeek V3 671B and GPT-OSS 20B had only NVIDIA entries, showing the absence of competitive hardware capable of handling these workloads at scale. The company also ran the largest training cluster yet: 8,192 GPUs on Microsoft Azure, hitting the quality target for Llama 3.1 405B in 7.07 minutes. This deployment leveraged Microsoft’s Azure infrastructure to demonstrate the scalability of NVIDIA’s NVL72 systems in cloud environments.

Related: ASUS TUF Gaming 16 skips new Intel chips

MLPerf lets any vendor submit hardware for testing. So far, none have matched NVIDIA’s pace. AMD, for instance, did not submit results for its newer MI350 series in any of the new or existing benchmarks, leaving the field entirely to NVIDIA’s Blackwell-based systems.

Performance gains aren’t just from new chips. NVIDIA’s GB300 systems, using the same NVL72 configuration as GB200, are now up to 60% faster thanks to higher AI compute density and NVFP4 optimizations. These improvements stem from architectural refinements, including enhanced tensor core efficiency and memory bandwidth utilization, which reduce training latency without increasing GPU count.

Related: CI Games Rebuilds Umbral Realm After Static Criticism

With the Vera Rubin platform approaching and ongoing software tweaks, the gap shows no signs of closing. The Vera Rubin platform is expected to introduce additional architectural and software enhancements, building on the already dominant performance of Blackwell-based systems in MLPerf Training 6.0.