AI Disruption

AI Disruption

OpenAI o3 Medium: The New "Cost-Effective King"? ARC-AGI Results Show Double Score at 1/20 Cost

OpenAI's o3 model doubles ARC-AGI scores at 1/20 cost! New benchmark results reveal its cost-performance dominance. Is this the AI efficiency king?

Meng Li's avatar
Meng Li
Apr 23, 2025
∙ Paid
3
1
Share

"AI Disruption" Publication 6000 Subscriptions 30% Discount Offer Link.


OpenAI Releases o3 in April: Score Doubles Second Place, Cost Only 1/20?!

The new performance of o3 (Medium) on the ultra-challenging ARC-AGI reasoning task has truly delivered a shocking surprise to everyone.

Image

According to the official ARC Prize announcement, the key conclusions from this round of testing are as follows:

  • o3 (Medium) scored 57% on ARC-AGI-1, with a cost of $1.5 per task, outperforming all known Chain-of-Thought (COT) reasoning models.

  • o4-mini (Medium) scored 42% on ARC-AGI-1, with a cost of $0.23 per task, showing lower accuracy but a significant cost advantage.

  • On the more difficult ARC-AGI-2, both models scored below 3%.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture