15 Mind-Blowing Interactive Worlds Google’s Genie 3 AI Can Create

A collage of diverse landscapes and scenes, including volcanoes, jellyfish, mountains, lakes, city streets, floating islands, a person in a wingsuit, sunset road, and a boat ride, with arrow keys overlayed at the center.

Google announced Genie 3, a new general-purpose AI-powered world model that can generate diverse, photorealistic interactive environments.

When given a text prompt, Genie 3 can create dynamic worlds users can explore in real time at up to 720p resolution and 24 frames per second. While 720p24 exploration may not sound initially impressive, Genie 3 represents a significant step forward as Google’s first world model that allows complex real-time interaction.

In contrast, Genie 2’s resolution topped out at 360p, and it offered users minimal movement inside AI-generated worlds. Users could perform a small set of actions in Genie 2 for about 10 to 20 seconds. However, in Genie 3, they can navigate a world for multiple minutes and even interact with in-world objects.

A comparison table of four AI models—GameNGen, Genie 2, Veo, and Genie 3—displaying differences in resolution, domain, control, interaction horizon, and latency. Genie 3 has 720p resolution and excels in multiple categories.

It’s also instructive to consider Google’s latest AI video generator, Veo 3. This generative AI video model represents a significant advancement over Veo 2, capable of achieving 4K resolution output. However, it has a notable limitation: clips are short, under 10 seconds, and interaction is restricted to video output controls. It is also worth noting that Genie 1 launched less than a year and a half ago. The progress Google’s DeepMind researchers have made is remarkable, if not a bit scary.

Genie 3 delivers real-world physics modeling, including water and lighting, the ability to simulate plant and animal behavior, fully modeled characters, and the ability to recreate real-world locations and even past eras.

 

 

 

 

 

 

 

“Achieving a high degree of controllability and real-time interactivity in Genie 3 required significant technical breakthroughs. During the auto-regressive generation of each frame, the model has to take into account the previously generated trajectory that grows with time,” Google explains.

“For example, if the user is revisiting a location after a minute, the model has to refer back to the relevant information from a minute ago. To achieve real-time interactivity, this computation must happen multiple times per second in response to new user inputs as they arrive.”

 

 

 

 

 

 

 

Google notes that it is also highly challenging to maintain consistency over any period with a foundational world model, as any seemingly minor inaccuracies quickly snowball. The system has a visual memory of about a minute. If a user navigates away from an object and then comes back, that object should remain in its original location. It is a significant accomplishment and entirely unprecedented for Google.

Google admits there are limitations, including a limited action space, challenges with multi-agent interaction in generated worlds, text rendering (a common issue for generative AI), and occasionally inaccurate geographic modeling of real locations. Nonetheless, the foundational technology on display here is remarkable.

Genie 3 is currently available for selected academics and researchers, but Google is investigating how to bring Genie 3 to additional testers soon.


Image credits: Google. Complete prompts used to generate all 15 examples above are detailed on Google’s Genie 3 blog post.

Discussion