Google’s New Gemini Omni AI Video Model Can Do Crazy Things

Google’s new Gemini Omni artificial intelligence (AI) model can do some wild things. The model’s key promise is to create anything from, well, anything.

Google says its new Gemini Omni model can “create anything from any input,” including audio, video, photos, and text. The model starts with video generation, which users can then edit via conversational text with Gemini. This first model, Gemini Omni Flash, is launching now in the Gemini app, Google Flow, and YouTube Shorts.

As Google explains, editing AI-generated video using text is straightforward. The model also promises to keep things consistent after editing, including characters, and Omni can remember what was visible in previous scenes.

Prompt: Make the sculpture out of bubbles.

The company even promises that Gemini Omni can use its “intuitive understanding of physics,” effectively “bridging the gap from photorealism to meaningful storytelling.”

Prompt: A marble rolling fast on a chain reaction style track, continuous smooth shot.

Users have already achieved impressive results with Gemini Omni. For example, ex-Google product manager Bilawal Sidhu gave Gemini Omni a photo with a sketched drone path on it and had the AI generate drone POV footage.

The Verge‘s Allison Johnson calls Omni “wild,” and had the AI bring her child’s stuffed animal, Buddy, to life. Buddy went on exciting AI adventures, including white-water rafting and snowboarding.

“The results are such a mixed bag they’re baffling. Some were very good — much more consistent and true to my prompt than when I was testing out Veo five months ago,” Johnson writes. “But even the best clips Omni cooked up for me still have certain AI jump scares, like when Buddy suddenly switches orientation while he’s skydiving.”

Prompt: turn this into realistic footage, using the drawing only as a guide for movement, do not show the drawing in the final video

As Johnson tested, Omni’s biggest claim to fame, being able to combine a wide variety of input media with AI-generated video, veers from technologically impressive to potentially hazardous. One of her deepfakes even convinced her husband, “a man who has looked at me in real life basically every single day for the last decade.”

View on Threads

Whether this is neat or terrifying depends on who is asked.

“I can’t be the only one to think, that this just has no reason to exist,” writes near_photography on Threads in response to Johnson’s post above. “There is no net benefit to society from this capability.”

Prompt: Apply the pose and motion from input video to provided character from this image. Apply style from image reference to the new video

As Google notes, all videos generated using Omni include its “imperceptible SynthID digital watermark,” which makes it easy for users to confirm if something was made with Google’s AI inside Gemini, Gemini in Chrome, and Google Search. But what if someone isn’t using those platforms?

Google is bringing this technology directly into YouTube Shorts and YouTube Create, for example, and it’s impossible to predict what people will do with it there.


Image credits: Google

Discussion