Midjourney Flips the Formula with New Image-to-Text Generator

Apr 05, 2023

Jeremy Gray

Midjourney has announced a new “/describe” command that allows users to leverage the powerful artificial intelligence (AI) platform to transform images into words, upending Midjourney’s typical procedure of converting text to images.

Paul DelSignore describes the feature on Medium, writing that “describe” has numerous significant benefits for a wide range of use cases.

Today we're releasing a /describe command that lets you transform images-into-words. Give it a shot! We think this tool will transform your liguistic-visual process both in terms of creative power and discovery.

— Midjourney (@midjourney) April 4, 2023

One of the best aspects of the describe feature is that it should improve accessibility. For people with visual impairments, navigating the web can be challenging. It’s made more accessible by Alt text elements that describe images. Creating these Alt elements manually is time-consuming, and Midjourney’s describe functionality may overcome this hurdle.

Improved search functionality is beneficial to nearly every internet user. Search engines can index images more effectively when they include better and more plentiful descriptions.

DelSignore also highlights the importance of captions, as detailed captions help explain images and provide more clarity to viewers.

Image-to-text generation creates an interesting feedback loop with Midjourney’s text-to-image system. While Midjourney users can already generate similar images based on a selection, image-to-text tools may make it easier to develop alternate and potentially more fruitful descriptions for the text-to-image generator.

Gonna remix one of my images I created with Element 3D on AE

Using the /describe function to see what it says on #midjourney v5 is really interesting for prompt generation so will now see what they make. pic.twitter.com/BvkL3pu3SI

— GooRee (@GooRee) April 3, 2023

In its current iteration, like with its text-to-image generator, Midjourney will create four different text descriptions of an uploaded image. It’s also possible to generate new variations based on a selected description. To upload a photo, users write “/describe” into the text field, and a drag-and-drop upload field appears.

Users can then select one of the generated descriptions and “remix” the uploaded image using the new text prompt. The user can also edit the text prompt, adding a new element of control to the creative process.

Testing Midjourney’s new /describe feature

PetaPixel tested the feature, first using a portrait captured by editor-in-chief Jaron Schneider.

Jaron Schneider Tamron 35mm — Photo by Jaron Schneider

Midjourney’s four generated descriptions are of varying quality.

young woman in polka dot dress standing on top of hill, in the style of pensive portraiture, lens flares, Voigtlander Heliar 15mm f/4.5, troubadour style, close up, calarts, simple

a young woman wearing a black and white polka dot dress standing, in the style of hazy landscapes, pensive portraiture, sun-soaked colors, candid portraiture, tumblewave, troubadour style, close up

a young woman in black and white checkered dress standing on slope of hill, in the style of anamorphic lens flare, polka dots, california plein air, pensive stillness, sun-soaked colors, candid portraits, delicate

leigh wyatt | san francisco weddings|daughtersmaidsoflake benson | san francisco, in the style of neo-pop sensibility, gentle lyricism

The first two descriptions are pretty good, especially the second one. It’s interesting that Midjourney described a specific Voigtlander 15mm prime lens, though, for the record, the image was shot with a Tamron 35mm f/1.8 prime. Using the second description to generate a remix leads to pretty impressive results.

Four AI-generated images that Midjourney v5 made using the “a young woman wearing a black and white polka dot dress standing, in the style of hazy landscapes, pensive portraiture, sun-soaked colors, candid portraiture, tumblewave, troubadour style, close up” description of the original portrait.

Using another image by Schneider, this time a landscape image from Mono Lake in California, Midjourney again generates mostly useful text descriptions, albeit with the wrong location information about Mono Lake.

Jaron Schneider Mono Lake — Photo by Jaron Schneider

monolake, las vegas, utah, united states of america near crystal, in the style of shot on 70mm, mikalojus konstantinas ciurlionis, post processing, 32k uhd, antoni gaudí, hazy landscapes, fenghua zhong

mono lake at sunset after a rain, in the style of focus stacking, light sky-blue and bronze, 32k uhd, national geographic photo, stock photo, dansaekhwa

mono lake, utah, in the style of 32k uhd, balanced symmetry, american tonalist, hazy, dreamlike quality, nikon d850, fenghua zhong

mono lake, california, sunrise photograph 1, in the style of 32k uhd, isolated landscapes, low depth of field

Using the third description as a remix prompt, Midjourney delivered four very realistic new images.

Midjourney’s /describe tool is intriguing, even in its early state. The tool should help creators make more detailed Alt text, captions, and even different AI-generated artwork. While some parts of the descriptions are puzzling, to say the least, they show promise.

Image credits: Jaron Schneider and Midjourney

News, Spotlight

accessibility, ai, alttext, artificialintelligence, midjourney, midjourneyv5

PetaPixel articles may include affiliate links; if you buy something through such a link, PetaPixel may earn a commission.

Love PetaPixel? Go Premium.

PetaPixel is the world’s largest independent photography publication and we’re only able to continue our work with your help. Become a PetaPixel Member now and support our mission to bring you regular, high-quality reviews, features, news, and more.

Remove banner ads
Gain access to full-resolution RAW and JPEG Sample Galleries
Get 5% off orders from KEH.com
Get a $15 coupon to spend at Moment
Get 10% off lighting equipment from FJ Westcott
Get 10% off film development, scanning, and printing from Blue Moon Camera and Machine

Discussion