Gemma 4 Delivers 3x Speedup, Day 0 Support for vLLM
Google Gemma 4 now runs 3x faster with MTP Drafter. Day 0 vLLM support, 2B–31B models, no quality loss.
“AI Disruption” Publication 9700 Subscriptions 20% Discount Offer Link.
Google just dropped another move yesterday — after releasing Gemma 4 in early April, they’ve now delivered a “plug-in” that makes inference up to 3x faster: the MTP Drafter.
The official announcement is short but powerful:
“Same quality, way more speed.”
What is Gemma 4?
Key highlights:
Full parameter range from 2B to 31B — from phone-friendly E2B/E4B models all the way up to workstation-grade 31B Dense and 26B MoE.
True multimodal — supports text, image, video, and audio.
Strong reasoning — reaches 85%+ on MMLU Pro, placing it in the top tier of open-source models.
Insane adoption — over 60 million downloads in the first 4 weeks (per Google’s own data).
No matter how strong a model is, it’s useless if it can’t run fast. Today’s update is Google’s direct attack on the “running” part.




