Google’s Veo Video Generator and Imagen 3 Try to Keep Up with OpenAI

Left image: A silhouette of a person riding a horse during sunset, with a grassy field and glowing sky in the background.  Right image: Two spotted jellyfish swimming underwater, with a clear blue ocean as the backdrop.

Google unveiled its new artificial intelligence-backed video generator Veo, signaling a significant expansion in the world of AI art.

Google went all in on its Gemini AI model at the company’s I/O keynote on Tuesday, and there were plenty of photo and video updates on display. In addition to Veo, Google announced an upgrade to its image creation tool Imagen 3 and expanded Gemini use cases with “Ask Photos,” which works with Google Photos.

Now that AI photos have become almost omnipresent, video feels ripe for disruption from large language models like Gemini. Just a few months ago, OpenAI made a big splash in the tech and video worlds when it released several short videos created with its AI video generation tool Sora. Even though these clips weren’t perfect, they were still considered hyper-realistic and highly detailed.

Google says Veo can create “high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute,” and showcases some footage the company says is unmodified (specifically, unedited raw output) at I/O. Additionally, the tech giant says Veo can mimic real-world physics and understands queries that include specifications for time-lapses and aerial footage.

To create a video using Veo, users can opt for text, video, or image prompts, and additional prompts can be used for edits.

Veo is only available to “select creators in private preview in VideoFX,” the company’s new “experimental tool,” over the next few weeks. The waitlist is open now for others who are interested as well.

While AI image and video generation have been met with criticisms over whose art is used to train these models and whether or not they are aware of or compensated for the fact, Google didn’t address those matters. Instead, it leaned heavily into the positives and how this will bring video creation to the masses. And it’s not just Google saying it, famous actor, writer, rapper, and director Donald Glover is saying it (at Google’s I/O keynote).

“Everybody’s going to become a director, and everybody should be a director,” Glover says in a video where he, along with his creative studio Gilga, work on a short film using Veo. “Because at the heart of all of this is just storytelling. The closer we are to being able to tell each other our stories, the more we’ll understand each other.

Google Launches Imagen 3

Not to be forgotten, Google’s AI text-to-image model Imagen also got some love during the I/O keynote with the announcement of the tool’s third iteration. Imagen 3 is meant to better understand “natural language, the intent behind your prompt and incorporates small details from longer prompts,” according to Google.

A picturesque view of a river winding through a lush, green mountainous landscape under a clear blue sky. Steep hillsides covered in dense foliage flank the river, which reflects the vivid sky. Rocks and sparse vegetation frame the foreground.
Prompt: View from above of beautiful river canyon with trees, showcasing its stunning natural beauty with green mountains and blue waters. The photo captures the vastness of nature’s creation in the style of its creation.
A picturesque landscape featuring three colorful hot air balloons floating over unique, rocky formations during sunrise or sunset. The sky is clear, and the distant mountains frame the scene, adding to the serene and captivating view.
Prompt: Shot in the style of DSLR camera with the polarizing filter. A photo of two hot air balloons floating over the unique rock formations in Cappadocia, Turkey. The colors and patterns on these balloons contrast beautifully against the earth tones of the landscape below. This shot captures the sense of adventure that comes with enjoying such an experience.

Three women are laughing and standing close together in the foreground, illuminated by warm sunlight from a low horizon behind them. The light creates a lens flare effect, highlighting their joyful expressions. They appear to be outdoors during sunset.

Details are meant to be richer, and the results less wonky — sorry, results contain “fewer visual artifacts.” Further, the update means Imagen 3 is less likely to forget about the smaller details users might include in prompts. It’s also much better at rendering text. Users can sign up to try Imagen 3 in ImageFX, and it is “coming soon” to developers and enterprise customers in VertexAI.

Image credits: Google