Meta Claims to Have Built an Unrivaled AI Image Generator

Jul 19, 2023

Matt Growcoot

CM3Leon example. A cactus wearing a hat and sunglasses. — A CM3Leon example generated with the prompt “A small cactus wearing a straw hat and neon sunglasses in the Sahara desert.”

Facebook’s owner Meta claims to have built a state-of-the-art AI image generator that requires less computational power and needs less training data that the models currently on the market.

Called CM3Leon (pronounced chameleon), it shuns the diffusion model used by DALL-E, Stable Diffusion, and Midjourney that creates an image by adding Gaussian noise and then subtracting it into a (sometimes) coherent image.

Instead, CM3Leon is a transformer model that uses a process called “attention” which judges the relevance of input data. This makes it faster to train the model and it needs less training data to begin with.

How Was CM3Leon Trained?

Seemingly conscious of the backlash that AI image generators like Midjourney and Stable Diffusion were built, Meta says it licensed its training data for CM3Leon from Shutterstock.

According to Tech Crunch, Meta built several versions of CM3Leon with the best-performing model having seven billion parameters — twice as many as DALL-E. (Parameters are what the model learns from the training data and later used as inputs.)

Multi-Modal

CM3Leon is a multi-modal AI image generator, meaning that it not only generates images but it can also produce captions for an image.

Meta gave the example of a dog with a stick. The user can ask CM3Leon “What is the dog carrying?” To which the model replies “Stick.”

Dog carrying a stick — Asked to describe this image, CM3leon responded “In this image, there is a dog holding a stick in its mouth. There is grass on the surface. In the background of the image, there are trees.”

The user can also ask CM3Leon to describe the image in fine detail, which it apparently does well on the example image. This type of technology could come in useful for photographers wanting to batch-caption and find keywords for thousands of their photos.

The ssame functionality can also be used to edit an image. Meta gave the example of the Girl With the Pearl Earring and asked CM3Leon to “put on a pair of sunglasses” or “What would she look like as a bearded man.” This feature could also be used to change the color of the sky in an image.

Girl with the Pearl earring

“With CM3Leon’s capabilities, image generation tools can produce more coherent imagery that better follows the input prompts,” Meta write in a blog post.

“We believe CM3Leon’s strong performance across a variety of tasks is a step toward higher-fidelity image generation and understanding.”

While Meta have built this intriguing new AI image generator, there is seemingly no plans to release it. This is presumably because of the volatility and uncertainty generative AI models are facing.