NVIDIA GauGAN2 AI Turns Sentences into Realistic ‘Photos’

Nov 22, 2021

Jaron Schneider

Two examples of text descriptions being used to create photos with an AI model

NVIDIA’s GauGAN2 artificial intelligence (AI) can now use simple written phrases to generate a fitting photorealistic image. The deep-learning model is able to craft different scenes in just three or four words.

GauGAN is NVIDIA’s AI program that was used to turn simple doodles into photorealistic masterpieces in 2019, a technology that was eventually turned into the NVIDIA Canvas app earlier this year. Now NVIDIA has advanced the AI even further to where it only needs a brief description in order to generate a “photo.”

NVIDIA says that the deep learning model behind GauGAH allows anyone to make beautiful scenes, and now it’s even easier than it ever has been. Users can simply type in a phrase like “sunset at a beach” and the AI will generate the scene in real time as each word is added. Adding an adjective like “sunset at a rocky beach” or swapping “sunset” for “afternoon” or “rainy day” and the model will modify the photo based on what is called generative adversarial networks (GAN).

“With the press of a button, users can generate a segmentation map, a high-level outline that shows the location of objects in the scene,” NVIDIA says. “From there, they can switch to drawing, tweaking the scene with rough sketches using labels like sky, tree, rock, and river, allowing the smart paintbrush to incorporate these doodles into stunning images.”

An AI-generated image created with the phrase, “a peaceful lake surrounded by tall trees in a foggy day.”

NVIDIA says that the demo is one of the first to combine multiple modalities within a single GAN network. GauGan2 combines segmentation mapping, inpainting, and text-to-image generation in a single model which NVIDA says makes it a powerful tool to allow users to create photorealistic art with a mix of words and drawings. The goal is to make it faster and easier to turn an artist’s vision into a high-quality AI-generated image. NVIDIA says that compared to other state-of-the-art models specifically for text-to-image or segmentation map-to-image applications, GauGAN2 produces a greater variety and higher-quality set of images.

“Rather than needing to draw out every element of an imagined scene, users can enter a brief phrase to quickly generate the key features and theme of an image, such as a snow-capped mountain range,” NVIDIA says. “This starting point can then be customized with sketches to make a specific mountain taller or add a couple of trees in the foreground, or clouds in the sky.”

An AI-generated image created with the phrase, “a tropical island with white sand beach view from above.”

While the realistic image creation is probably the most impressive, GauGAN2 is not limited to that kind of recreation. Artists can also use the demo to depict otherworldly, fictional landscapes. NVIDIA shows a scene that recreates something akin to the Star Wars fictional planet of Tatooine, where the desert scene is initially created by the model but a second sun is added afterward.

An AI-generated image created with the phrase, “endless tall mountains in a sunny day.”

“It’s an iterative process, where every word the user types into the text box adds more to the AI-created image.”

The text-to-image feature can be tested on NVVIDIA AI Demos where anyone can try creating custom scenes with text prompts and further adjust them with quick sketches to create more refined results.