Grok 4 Leaked Scores: 45% on HLE, Double Gemini 2.5

Grok 4 benchmark scores leaked: 45% on HLE test, doubles Gemini 2.5 performance. Grok 4 Code hits 72-75% on SWE Bench. Musk's tent camping development pays off?

Meng Li

Jul 05, 2025

∙ Paid

"AI Disruption" Publication 7100 Subscriptions 20% Discount Offer Link.

Elon Musk's Grok 4 Is Coming — Here's Why You Can't Afford to Miss It | by Ask With Ai | Jul, 2025 | Medium

Musk's tent camping and all-night development paying off? Such high benchmark scores, yet no release.

Just now, benchmark test results for Grok 4 and Grok 4 Code were allegedly leaked.

Grok 4 Code Leaked! xAI Valued at $113B

Meng Li

July 2, 2025

Read full story

X blogger @legit_api posted that Grok 4 scored 35% on HLE (Humanities Last Exam) in standard mode, improving to 45% when using reasoning techniques; scored 87-88% on GPQA; while Grok 4 Code achieved 72-75% on SWE Bench.

What do these benchmark results mean? Some netizens compared them with competing models like OpenAI o3 and Claude Opus 4.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.