AI Disruption

AI Disruption

Is DeepSeek's R1-Zero More Worthy of Attention Than R1?

R1-Zero by DeepSeek could revolutionize AI by eliminating the need for human-labeled data, relying fully on reinforcement learning for self-evolution and reasoning.

Meng Li's avatar
Meng Li
Jan 30, 2025
∙ Paid

"AI Disruption" publication New Year 30% discount link.


Are models like R1-Zero breaking the human data bottleneck and ushering in a new paradigm of AI self-evolution?

Compared to R1, the recently released R1-Zero by DeepSeek deserves more attention.

R1-Zero is worth analyzing more than R1 because it fully relies on Reinforcement Learning (RL) rather than human expert-labeled Supervised fine tuning (SFT). This suggests that in certain tasks, human labeling may not be necessary, and in the future, broader reasoning capabilities might be achievable purely through RL methods.

Additionally, the success of both R1 and R1-Zero can reveal several insights, such as:

  • By investing more computational resources, the accuracy and reliability of AI systems can be significantly enhanced, which will increase user trust in AI and drive commercial applications.

  • The reasoning process generates large amounts of high-quality training data, and these data are created through user payment. This "reasoning as training" new paradigm could fundamentally change the way the AI data economy operates, creating a self-reinforcing cycle.

User's avatar

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.
© 2026 Meng Li · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture