Video-to-audio research uses video pixels and text prompts to generate rich soundtracks.
Video generation models are advancing at an incredible pace, but many current systems can only generate silent output. One of the next major steps toward bringing generated movies to life is creating soundtracks for these silent videos.
Today, we’re sharing progress on our video-to-audio (V2A) technology, which makes synchronized audiovisual generation possible. V2A combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.