AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Grok 4 Leaked Scores: 45% on HLE, Double Gemini 2.5

Grok 4 Leaked Scores: 45% on HLE, Double Gemini 2.5

Grok 4 benchmark scores leaked: 45% on HLE test, doubles Gemini 2.5 performance. Grok 4 Code hits 72-75% on SWE Bench. Musk's tent camping development pays off?

Meng Li's avatar
Meng Li
Jul 05, 2025
∙ Paid
6

Share this post

AI Disruption
AI Disruption
Grok 4 Leaked Scores: 45% on HLE, Double Gemini 2.5
2
Share

"AI Disruption" Publication 7100 Subscriptions 20% Discount Offer Link.


Elon Musk's Grok 4 Is Coming — Here's Why You Can't Afford to Miss It | by  Ask With Ai | Jul, 2025 | Medium

Musk's tent camping and all-night development paying off? Such high benchmark scores, yet no release.

Just now, benchmark test results for Grok 4 and Grok 4 Code were allegedly leaked.

Grok 4 Code Leaked! xAI Valued at $113B

Grok 4 Code Leaked! xAI Valued at $113B

Meng Li
·
Jul 2
Read full story

X blogger @legit_api posted that Grok 4 scored 35% on HLE (Humanities Last Exam) in standard mode, improving to 45% when using reasoning techniques; scored 87-88% on GPQA; while Grok 4 Code achieved 72-75% on SWE Bench.

image.png

What do these benchmark results mean? Some netizens compared them with competing models like OpenAI o3 and Claude Opus 4.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share