OpenAI’s New Tech Lets You Generate Any ‘Photo’ By Just Describing It
OpenAI’s new DALL-E 2 artificial intelligence system is capable of creating photo-realistic images based only on a brief description and allows a person to easily edit the image with simple tools.
Original, Realistic Images Based On Text Descriptions
San Francisco-based OpenAI is an artificial intelligence company that is closely affiliated with Microsoft. Its latest product, DALL-E 2, is an AI image generator that can create realistic images and art from only a one-sentence description.
OpenAI’s DALL-E 2 website has multiple examples of artwork that has been generated using simple sentences. From, “an astronaut playing basketball with cats in space” to “a bowl of soup that looks like a monster knitted out of wool,” the examples appear to be purposely obscure to show the flexibility of the platform.
The program sounds very similar to NVIDIA’s GauGAN2, which is also able to take sentences and turn them into realistic photos. The previous version of DALL-E was only able to make cartoonish-looking images on a blain background that weren’t nearly as impressive as NVIDIA’s examples, but this new version is able to generate photo-quality images in high resolution with complex backgrounds, depth of field effects, shadows, shading, and reflections, reports Fortune.
The AI doesn’t just need to make images from scratch either. Part of its strength is the ability to add objects to existing photos. For example, it is able to add a sofa in a variety of shapes, sizes, and colors into an existing photo in various locations.
DALL-E 2 can also take an input image and generate different variations on it that are inspired by the original. For example, when provided the following:
The AI was able to generate multiple other images, including:
OpenAI says that DALL-E 2 can make realistic edits to existing images based only on a brief description of the desired result and can add and remove elements while taking shadows, reflections, and textures into account.
“DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called ‘diffusion,’ which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image,” OpenAI explains.
“Our hope is that DALL-E 2 will empower people to express themselves creatively. DALL-E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity.”
Clearly, DALL-E 2 is not infallible, and the system still has issues with rendering details in complex scenes and may struggle with shadow effects, which can be seen on the underside of the AI-generated couch in the photos above. Still, it’s a fast-improving technology, as DALL-E 2 is already miles ahead of the original DALL-E, which was originally showcased just a year ago.
Getting Ahead of Potential Misuse
OpenAI is aware of some of the issues that can arise from an AI-powered image generation system. For now, the platform isn’t available to the public as the company studies how to responsibly deploy it. The company has already limited the ability of the AI to generate violent, hate, or adult images and removed the most explicit content from the DALL-E 2’s training data. OpenAi says it has also used techniques to prevent the photorealistic generation of real people’s faces, including those of public figures.
“Our content policy does not allow users to generate violent, adult, or political content, among other categories,” OpenAI explains. “We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.”
OpenAI’s full DALL-E 2’s research paper can be read on the company’s website.
Image credits: All photos by OpenAI. Header images generated from the description, “an astronaut riding a horse.”