OpenAI Open-Sources PaperBench: Claude Leads, Reshaping AI Benchmarking
OpenAI's PaperBench reveals Claude-3.5-Sonnet dominates AI paper replication, surpassing GPT-4o & o1. New benchmark tests full agent capabilities beyond single tasks.
"AI Disruption" Publication 5500 Subscriptions 20% Discount Offer Link.
OpenAI acknowledges that Claude is the best.
The newly open-sourced benchmark PaperBench pits six cutting-edge large model-driven agents against each other to reproduce top-tier AI conference papers, with the new Claude-3.5-Sonnet significantly surpassing o1/r1 to take first place.