Google Launches Gemini 2.5 Flash Cuts Costs by 6x with Hybrid Reasoning – Rivals o4-mini in Thinking Mode

Google's Gemini 2.5 Flash AI slashes costs by 6x with hybrid reasoning. Outperforms Claude 3.7, rivals GPT-4.5. Custom 'thinking mode' for optimal performance.

Apr 18, 2025

∙ Paid

"AI Disruption" Publication 5900 Subscriptions 20% Discount Offer Link.

Just now, Google announced its first hybrid inference model, Gemini 2.5 Flash.

Similar to Claude, the new model’s “thinking budget” is customizable, allowing users to enable or disable Gemini 2.5’s thinking mode.

Notably, disabling thinking mode reduces costs by a staggering 600%, while performance remains comparable to Gemini 2.0 Flash.

Specifically, Gemini 2.5 Flash’s output price is $0.6 per million tokens with thinking disabled and $3.5 per million tokens with thinking enabled.

Naturally, the longer the model “thinks,” the better its performance.

In the GPQA knowledge Q&A benchmark, a 24k thinking budget improved performance by 6%; for coding tasks (LiveCodeBench), a 16k thinking budget yielded the best results.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.