AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
OpenAI's New Model o3: These 34 Questions Stump Me

OpenAI's New Model o3: These 34 Questions Stump Me

Explore OpenAI's groundbreaking o3 model, the first AI to surpass ARC-AGI benchmarks, tackling complex tasks and revealing AGI challenges.

Meng Li's avatar
Meng Li
Dec 29, 2024
∙ Paid
2

Share this post

AI Disruption
AI Disruption
OpenAI's New Model o3: These 34 Questions Stump Me
1
Share
o3 Model by OpenAI TESTED ($1800+ per task)

Minor failure, with a 12.5% margin.

A few days ago, OpenAI completed the final update in its 12 consecutive updates — as anticipated, introducing the new reasoning models o3 and o3-mini.

Starting with o1, OpenAI's proposed reasoning scaling laws have brought new hope for achieving AGI. The benchmark used to evaluate o3’s reasoning capabilities is ARC-AGI, which has been around for five years but remains unsolved until now.

The new model, o3, is the first AI model to surpass the ARC-AGI benchmark: its minimum performance reached 75.7%, and with more computational resources and extended processing time, it could even achieve up to 87.5%.

In comparison, the o1 model previously achieved an accuracy of only 25% to 32% on the same benchmark.

The ARC-AGI benchmark requires AI to identify patterns based on paired "input-output" examples and then predict the output for a given input.

François Chollet, the creator of ARC-AGI and father of Keras, stated in the test report that despite the high costs, the results confirm that performance on new tasks improves with increased computation.

For o3, each task costs $17-$20 in low-computation mode, while high-computation mode can cost thousands of dollars per task.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share