AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
DeepSeek Unveils New Paper on Inference-Time Scaling, Is R2 Coming?
Copy link
Facebook
Email
Notes
More

DeepSeek Unveils New Paper on Inference-Time Scaling, Is R2 Coming?

DeepSeek's new Self-Principled Critique Tuning (SPCT) boosts AI reward models. Is R2 coming? Read the arXiv paper now!

Meng Li's avatar
Meng Li
Apr 04, 2025
∙ Paid
4

Share this post

AI Disruption
AI Disruption
DeepSeek Unveils New Paper on Inference-Time Scaling, Is R2 Coming?
Copy link
Facebook
Email
Notes
More
2
Share

"AI Disruption" Publication 5500 Subscriptions 20% Discount Offer Link.


A brand-new learning method.

Could this be the prototype of DeepSeek R2? This Friday, the latest paper submitted by DeepSeek to arXiv is gradually heating up in the AI community.

Currently, reinforcement learning (RL) is widely applied to the post-training of large language models (LLMs).

Recent incentives of RL on LLM reasoning capabilities suggest that appropriate learning methods can achieve effective inference-time scalability.

One key challenge of RL is obtaining accurate reward signals for LLMs across various domains beyond verifiable problems or human-defined rules.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More