Kimi Agent Achieves New SOTA on "Humanity's Last Exam"
Moonshot AI's Kimi-Researcher achieves 26.9% SOTA on Humanity's Last Exam using end-to-end agent reinforcement learning, outperforming models like o3.
"AI Disruption" Publication 6900 Subscriptions 20% Discount Offer Link.
Yesterday, Moonshot AI released the autonomous Agent Kimi-Researcher.
This Agent excels in multi-round search and reasoning, performing an average of 23 reasoning steps per task and accessing over 200 websites.
It is built on an internal version of the Kimi k-series model and trained entirely through end-to-end agent reinforcement learning, making it one of the few Agents built on a proprietary model.
In "Humanity's Last Exam" (HLE), Kimi-Researcher achieved a 26.9% Pass@1 score, setting a new state-of-the-art (SOTA) benchmark, with a Pass@4 accuracy of 40.17%. Starting from an initial 8.6% HLE score, Kimi-Researcher improved to 26.9% almost entirely through end-to-end reinforcement learning, demonstrating the immense potential of this approach in enhancing Agent intelligence.
Kimi-Researcher also performed exceptionally in multiple complex and highly challenging real-world benchmarks.
On xbench (a new dynamic, professionally aligned suite designed to integrate AI capabilities with practical productivity), Kimi-Researcher achieved an average Pass@1 score of 69% on the xbench-DeepSearch subtask (average of 4 runs), surpassing models like o3 with search tools.
In benchmarks like multi-round search reasoning (e.g., FRAMES, Seal-0) and factual information retrieval (e.g., SimpleQA), Kimi-Researcher also delivered outstanding results.