DeepSeek Unveils New Paper on Inference-Time Scaling, Is R2 Coming?

Apr 04, 2025

∙ Paid

A brand-new learning method.

Could this be the prototype of DeepSeek R2? This Friday, the latest paper submitted by DeepSeek to arXiv is gradually heating up in the AI community.

Currently, reinforcement learning (RL) is widely applied to the post-training of large language models (LLMs).

Recent incentives of RL on LLM reasoning capabilities suggest that appropriate learning methods can achieve effective inference-time scalability.

One key challenge of RL is obtaining accurate reward signals for LLMs across various domains beyond verifiable problems or human-defined rules.

AI Disruption