Xiaomi's 7B Model Tops MMAU with DeepSeek-R1 Algorithm
Xiaomi's 7B model achieves 64.5% accuracy on MMAU using DeepSeek-R1's GRPO algorithm, surpassing GPT-4o. Explore the future of audio understanding with reinforcement learning.
"AI Disruption" Publication 5000 Subscriptions 20% Discount Offer Link.
Can a 7B small model with 38,000 training data points make the MMAU benchmark for audio understanding and inference change its throne?
Inspired by the reinforcement learning algorithm in DeepSeek-R1, Xiaomi's large model team fine-tuned Alibaba's Qwen2-Audio-7B model.
As a result, the model's accuracy on MMAU increased from 49.2% to 64.5% (a 31% improvement), nearly 10 percentage points higher than the previously dominant GPT-4o.
MMAU is a benchmark consisting of 10,000 audio samples covering speech, environmental sounds, and music, with a very high level of difficulty. The score achieved by human experts is 82.2%.