AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Kimi Agent Achieves New SOTA on "Humanity's Last Exam"

Kimi Agent Achieves New SOTA on "Humanity's Last Exam"

Moonshot AI's Kimi-Researcher achieves 26.9% SOTA on Humanity's Last Exam using end-to-end agent reinforcement learning, outperforming models like o3.

Meng Li's avatar
Meng Li
Jun 23, 2025
∙ Paid
6

Share this post

AI Disruption
AI Disruption
Kimi Agent Achieves New SOTA on "Humanity's Last Exam"
2
Share

"AI Disruption" Publication 6900 Subscriptions 20% Discount Offer Link.


Kimi Researcher: Reasons better than OpenAI o3 - YouTube

Yesterday, Moonshot AI released the autonomous Agent Kimi-Researcher.

This Agent excels in multi-round search and reasoning, performing an average of 23 reasoning steps per task and accessing over 200 websites.

It is built on an internal version of the Kimi k-series model and trained entirely through end-to-end agent reinforcement learning, making it one of the few Agents built on a proprietary model.

In "Humanity's Last Exam" (HLE), Kimi-Researcher achieved a 26.9% Pass@1 score, setting a new state-of-the-art (SOTA) benchmark, with a Pass@4 accuracy of 40.17%. Starting from an initial 8.6% HLE score, Kimi-Researcher improved to 26.9% almost entirely through end-to-end reinforcement learning, demonstrating the immense potential of this approach in enhancing Agent intelligence.

Kimi-Researcher also performed exceptionally in multiple complex and highly challenging real-world benchmarks.

On xbench (a new dynamic, professionally aligned suite designed to integrate AI capabilities with practical productivity), Kimi-Researcher achieved an average Pass@1 score of 69% on the xbench-DeepSearch subtask (average of 4 runs), surpassing models like o3 with search tools.

In benchmarks like multi-round search reasoning (e.g., FRAMES, Seal-0) and factual information retrieval (e.g., SimpleQA), Kimi-Researcher also delivered outstanding results.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share