Stability AI Preview Next-Gen AI Image Generator Stable Diffusion 3

Feb 23, 2024

Matt Growcoot

STable Diffusion 3 example — Stable Diffusion 3 example. Prompt: “Trees photographed under the Milky Way, the moon and twilight shine on the Valley. The full moon appears high in the sky and the twilight glow can still be seen.”

Stability AI has announced a fresh iteration of its AI image generator in the form of Stable Diffusion 3.

There are scant details about the model but Stability AI is touting its joined-up text-generating capability — AI image generators notoriously struggle with text often producing garbled words.

The company also says that Stable Diffusion 3 is a next-generation image-synthesis model and like previous versions of Stable Diffusion it will be open-weights and open-sources meaning users can run the model locally and fine-tune it.

As mentioned, the company has not revealed much about Stable Diffusion 3. It has not been made available to the public but there is a waitlist as the company seeks to “improve its quality and safety.”

Photo studio shot of a chameleon, generated by Stable Diffusion 3.

The newest model does well with text, apparently.

Training Data

Stable Diffusion is embroiled in a lawsuit with Getty Images over training data with the photo agency alleging that the AI company used 12 million of its photos unlawfully.

Nothing has really been said about the training data for Stability’s new product with a blog post focusing more on safety — a constant problem with AI image generators.

“Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment,” it writes.

“In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.”

Stable Diffusion 3 example

A “detailed technical report” will be released but Stability AI CEO Emad Mostaque has been on X (formerly Twitter) revealing some details.

“This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements,” he writes.

“This takes advantage of transformer improvements and can not only scale further but accept multimodal inputs.”

The model will use diffusion transformer architecture rather than U-Net architecture. Transformers work on small pieces of the pictures at a time and are apparently good for handling patterns and sequences.

Stable Diffusion 3 will also utilize something called “flow matching”, a technique that generates images by transitioning random noise into a coherent image — focusing on the overall direction of the image.

It was only last week that Stability AI previewed an entirely new text-to-image model called Stable Cascade and are also making a foray into AI generative video.

Image credits: Stability AI