Google Veo 3 Achieves First-Ever Audio-Visual Sync: Video Model "Speaks Directly"

Veo 3 by Google: AI videos now speak! Perfect lip-sync, sound effects & dialogue in one prompt. Viral 8-sec clips amaze social media.

May 22, 2025

∙ Paid

"AI Disruption" Publication 6500 Subscriptions 20% Discount Offer Link.

Do you remember the most viral AI video clip from 2023? Will Smith eating noodles, with glitchy movements, and a silent scene—

Back then, video models could only generate motion, not speech.

The release of Sora marked a leap in video quality and significant advancements in modeling physical rules, directly igniting the entire field.

Startups like Runway, Pika, Luma, Kling, Genmo, Higgsfield, and Lightricks, along with tech giants like OpenAI, Google, Alibaba, and ByteDance, all jumped into the race.

But no matter how much video quality improved, the videos remained “mute”—

You could make characters run, flip, or even move in slow motion, but if you wanted them to speak, hear the sound of wind, footsteps, or even the sizzling of food in a pan?

Sorry, you’d have to add audio yourself.

Even more troublesome, the added audio might not sync properly—lip movements wouldn’t match the dialogue, footsteps wouldn’t hit the beat, and the emotional atmosphere always felt slightly off.

Until today, when Google officially released Veo 3. AI videos can finally “speak”—

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.