Alibaba’s QwQ-32B: 1/20 the Parameters, DeepSeek R1-Level Performance
Discover Alibaba's QwQ-32B: a 32B parameter inference model delivering DeepSeek R1-level performance through advanced reinforcement learning.
Today, Alibaba Open Source released its new inference model, QwQ-32B, which has 32 billion parameters but its performance can rival the full-blood version of DeepSeek-R1 with 671 billion parameters.
On X, Qianwen stated:
“This time, we have explored methods to extend RL, and based on our Qwen2.5-32B, we have achieved some impressive results. We found that RL training can continuously improve performance, especially on math and coding tasks, and we observed that continuous extension of RL can help medium-sized models achieve performance comparable to that of giant MoE models. Feel free to chat with our new model and provide us with feedback!”