Google AI Turns Text into Images with ‘Unprecedented Realism’

May 24, 2022

Jaron Schneider

Google Research Imagen

Google Research has developed an advanced artificial intelligence (AI) system that can turn any phrase into a strikingly realistic photo.

Called the Imagen diffusion model, Google Research says the AI is able to create photorealistic images with an “unprecedented degree of photorealism and a deep level of language understanding.”

Turning brief text descriptions into images is not a new idea. Earlier this year, OpenAI’s Dall-E2 AI demonstrated the ability to create images based only on a brief description and let them be edited with a simple set of tools to fine tune the result. While impressive, the “photorealism” of those images was secondary and the system was mostly impressive based on how it was able to render results with different artistic styles.

Google’s approach seems to be more strongly tied to making images look like actual photos and the examples it has published that were created with the system look far more like photos than the drawings created by OpenAI’s system.

Below is an example of what the system can create using the description, “A photo of a fuzzy panda wearing sunglasses and a red shirt, playing guitar in a garden.”

Google Research Imagen

As was the case with OpenAI’s system, direct access to the AI is not available to the public because Google doesn’t think it’s quite ready yet; all of the examples are pre-generated. Google’s system, and those like it, are trained on large sets of data that are pulled from the internet and are not curated, which can introduce some problems and could be misued if it were to be given to the public in its current state.

“While this approach has enabled rapid algorithmic advances in recent years, datasets of this nature often reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups,” the researchers explain.

“The Toronto skyline with google brain logo.”

“While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset, which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.”

The researchers found that the AI already exhibits social biases and that it tended to create images of people with lighter skin tones and put them into stereotypical gender roles, Engadget reports.

“In future work we will explore a framework for responsible externalization that balances the value of external auditing with the risks of unrestricted open-access,” the researchers say.

“A blue jay standing on a large basket of rainbow macarons.”

While direct access to the AI isn’t available, its capability is on full display and can be browsed on the research website where the full research paper can be read as well.

Header image: Photo on the left created using the phrase, “A photo of a Corgi dog riding a bike in Times Square. It is wearing sunglasses and a beach hat.” The photo on the right was created with the text, “A robot couple fine dining with Eiffel Tower in the background.”