AI Disruption

AI Disruption

Grok 4 Leaked Scores: 45% on HLE, Double Gemini 2.5

Grok 4 benchmark scores leaked: 45% on HLE test, doubles Gemini 2.5 performance. Grok 4 Code hits 72-75% on SWE Bench. Musk's tent camping development pays off?

Meng Li's avatar
Meng Li
Jul 05, 2025
∙ Paid

"AI Disruption" Publication 7100 Subscriptions 20% Discount Offer Link.


Elon Musk's Grok 4 Is Coming — Here's Why You Can't Afford to Miss It | by  Ask With Ai | Jul, 2025 | Medium

Musk's tent camping and all-night development paying off? Such high benchmark scores, yet no release.

Just now, benchmark test results for Grok 4 and Grok 4 Code were allegedly leaked.

Grok 4 Code Leaked! xAI Valued at $113B

Grok 4 Code Leaked! xAI Valued at $113B

Meng Li
·
July 2, 2025
Read full story

X blogger @legit_api posted that Grok 4 scored 35% on HLE (Humanities Last Exam) in standard mode, improving to 45% when using reasoning techniques; scored 87-88% on GPQA; while Grok 4 Code achieved 72-75% on SWE Bench.

image.png

What do these benchmark results mean? Some netizens compared them with competing models like OpenAI o3 and Claude Opus 4.

User's avatar

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.
© 2026 Meng Li · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture