OpenAI o3 Medium: The New "Cost-Effective King"? ARC-AGI Results Show Double Score at 1/20 Cost

Apr 23, 2025

∙ Paid

OpenAI Releases o3 in April: Score Doubles Second Place, Cost Only 1/20?!

The new performance of o3 (Medium) on the ultra-challenging ARC-AGI reasoning task has truly delivered a shocking surprise to everyone.

According to the official ARC Prize announcement, the key conclusions from this round of testing are as follows:

o3 (Medium) scored 57% on ARC-AGI-1, with a cost of $1.5 per task, outperforming all known Chain-of-Thought (COT) reasoning models.
o4-mini (Medium) scored 42% on ARC-AGI-1, with a cost of $0.23 per task, showing lower accuracy but a significant cost advantage.
On the more difficult ARC-AGI-2, both models scored below 3%.

Continue reading this post for free, courtesy of Meng Li.