Google’s newest Gemini Omni model can turn real videos into surreal fever dreams

TL;DR

Google is introducing Gemini Omni, a new multimodal model to generate videos.
Omni builds upon Veo and creates videos using text, audio, stills, and even actual videos.
In addition to the Gemini app, Omni will be available through Flow for paying Gemini users.
You can also use it for free to create Remixes using YouTube Shorts.

Video generation has been one of the most compelling creative uses of AI. Among the platforms that have helped fuel the phenomenon is Google’s Veo, especially Gen 3, which has proven incredibly powerful at creating entire scenes with consistent elements and nearly perfect lip-syncs. While Veo 3 (and newer 3.1) has been limited to creating purely AI-generated videos with text and audio, Google is introducing a new model at Google I/O 2026 that goes a step further by letting you modify real-life footage into spectacular clips.

Gemini Omni is Google’s new class of multimodal models that can reform actual footage into something that would probably only exist in your head if it weren’t for AI. It’s arriving first in the form of Omni Flash, which, Google says, can combine multiple forms of input — text, audio, statics, and video — to generate something radically different in any of these formats. However, it’s starting with video, where users will be able to create videos as wild as their wildest fever dreams, while ensuring character consistency across multiple frames.