Xiaomi's 7B Model Tops MMAU with DeepSeek-R1 Algorithm

Xiaomi's 7B model achieves 64.5% accuracy on MMAU using DeepSeek-R1's GRPO algorithm, surpassing GPT-4o. Explore the future of audio understanding with reinforcement learning.

Mar 17, 2025

∙ Paid

"AI Disruption" Publication 5000 Subscriptions 20% Discount Offer Link.

Can a 7B small model with 38,000 training data points make the MMAU benchmark for audio understanding and inference change its throne?

Inspired by the reinforcement learning algorithm in DeepSeek-R1, Xiaomi's large model team fine-tuned Alibaba's Qwen2-Audio-7B model.

As a result, the model's accuracy on MMAU increased from 49.2% to 64.5% (a 31% improvement), nearly 10 percentage points higher than the previously dominant GPT-4o.

MMAU is a benchmark consisting of 10,000 audio samples covering speech, environmental sounds, and music, with a very high level of difficulty. The score achieved by human experts is 82.2%.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.