ByteDance Releases OmniHuman: Generate Videos from a Single Image and Audio

OmniHuman by ByteDance revolutionizes portrait video generation with multimodal AI. Create vivid, high-quality videos from a single image and audio.

Feb 05, 2025

∙ Paid

"AI Disruption" publication New Year 30% discount link.

Do you remember Loopy, the portrait audio-driven technology that sparked heated discussions on X half a year ago?

An upgraded technology solution has arrived. The ByteDance Digital Human team has launched a new multimodal digital human scheme called OmniHuman. It can generate video by combining a single image of any size and person ratio with a piece of input audio. The generated human video is vivid and exhibits a very high degree of naturalness.

For example, given the image below:

The characters generated by OmniHuman can move naturally in the video:

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.