Grok 4 Leaked Scores: 45% on HLE, Double Gemini 2.5
Grok 4 benchmark scores leaked: 45% on HLE test, doubles Gemini 2.5 performance. Grok 4 Code hits 72-75% on SWE Bench. Musk's tent camping development pays off?
"AI Disruption" Publication 7100 Subscriptions 20% Discount Offer Link.
Musk's tent camping and all-night development paying off? Such high benchmark scores, yet no release.
Just now, benchmark test results for Grok 4 and Grok 4 Code were allegedly leaked.
X blogger @legit_api posted that Grok 4 scored 35% on HLE (Humanities Last Exam) in standard mode, improving to 45% when using reasoning techniques; scored 87-88% on GPQA; while Grok 4 Code achieved 72-75% on SWE Bench.
What do these benchmark results mean? Some netizens compared them with competing models like OpenAI o3 and Claude Opus 4.