AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Google Veo 3 Achieves First-Ever Audio-Visual Sync: Video Model "Speaks Directly"
Copy link
Facebook
Email
Notes
More

Google Veo 3 Achieves First-Ever Audio-Visual Sync: Video Model "Speaks Directly"

Veo 3 by Google: AI videos now speak! Perfect lip-sync, sound effects & dialogue in one prompt. Viral 8-sec clips amaze social media.

Meng Li's avatar
Meng Li
May 22, 2025
∙ Paid
5

Share this post

AI Disruption
AI Disruption
Google Veo 3 Achieves First-Ever Audio-Visual Sync: Video Model "Speaks Directly"
Copy link
Facebook
Email
Notes
More
5
Share

"AI Disruption" Publication 6500 Subscriptions 20% Discount Offer Link.


Do you remember the most viral AI video clip from 2023? Will Smith eating noodles, with glitchy movements, and a silent scene—

Image

Back then, video models could only generate motion, not speech.

The release of Sora marked a leap in video quality and significant advancements in modeling physical rules, directly igniting the entire field.

Startups like Runway, Pika, Luma, Kling, Genmo, Higgsfield, and Lightricks, along with tech giants like OpenAI, Google, Alibaba, and ByteDance, all jumped into the race.

But no matter how much video quality improved, the videos remained “mute”—

You could make characters run, flip, or even move in slow motion, but if you wanted them to speak, hear the sound of wind, footsteps, or even the sizzling of food in a pan?

Sorry, you’d have to add audio yourself.

Even more troublesome, the added audio might not sync properly—lip movements wouldn’t match the dialogue, footsteps wouldn’t hit the beat, and the emotional atmosphere always felt slightly off.

Until today, when Google officially released Veo 3. AI videos can finally “speak”—

Image

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More