AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Google Launches Gemini 2.5 Flash Cuts Costs by 6x with Hybrid Reasoning – Rivals o4-mini in Thinking Mode
Copy link
Facebook
Email
Notes
More

Google Launches Gemini 2.5 Flash Cuts Costs by 6x with Hybrid Reasoning – Rivals o4-mini in Thinking Mode

Google's Gemini 2.5 Flash AI slashes costs by 6x with hybrid reasoning. Outperforms Claude 3.7, rivals GPT-4.5. Custom 'thinking mode' for optimal performance.

Meng Li's avatar
Meng Li
Apr 18, 2025
∙ Paid
2

Share this post

AI Disruption
AI Disruption
Google Launches Gemini 2.5 Flash Cuts Costs by 6x with Hybrid Reasoning – Rivals o4-mini in Thinking Mode
Copy link
Facebook
Email
Notes
More
1
Share

"AI Disruption" Publication 5900 Subscriptions 20% Discount Offer Link.


Just now, Google announced its first hybrid inference model, Gemini 2.5 Flash.

Similar to Claude, the new model’s “thinking budget” is customizable, allowing users to enable or disable Gemini 2.5’s thinking mode.

Notably, disabling thinking mode reduces costs by a staggering 600%, while performance remains comparable to Gemini 2.0 Flash.

Specifically, Gemini 2.5 Flash’s output price is $0.6 per million tokens with thinking disabled and $3.5 per million tokens with thinking enabled.

Naturally, the longer the model “thinks,” the better its performance.

In the GPQA knowledge Q&A benchmark, a 24k thinking budget improved performance by 6%; for coding tasks (LiveCodeBench), a 16k thinking budget yielded the best results.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More