Midjourney Flips the Formula with New Image-to-Text Generator
Midjourney has announced a new “/describe” command that allows users to leverage the powerful artificial intelligence (AI) platform to transform images into words, upending Midjourney’s typical procedure of converting text to images.
Paul DelSignore describes the feature on Medium, writing that “describe” has numerous significant benefits for a wide range of use cases.
Today we're releasing a /describe command that lets you transform images-into-words. Give it a shot! We think this tool will transform your liguistic-visual process both in terms of creative power and discovery.
— Midjourney (@midjourney) April 4, 2023
One of the best aspects of the describe feature is that it should improve accessibility. For people with visual impairments, navigating the web can be challenging. It’s made more accessible by Alt text elements that describe images. Creating these Alt elements manually is time-consuming, and Midjourney’s describe functionality may overcome this hurdle.
Improved search functionality is beneficial to nearly every internet user. Search engines can index images more effectively when they include better and more plentiful descriptions.
DelSignore also highlights the importance of captions, as detailed captions help explain images and provide more clarity to viewers.
Image-to-text generation creates an interesting feedback loop with Midjourney’s text-to-image system. While Midjourney users can already generate similar images based on a selection, image-to-text tools may make it easier to develop alternate and potentially more fruitful descriptions for the text-to-image generator.
Gonna remix one of my images I created with Element 3D on AE
Using the /describe function to see what it says on #midjourney v5 is really interesting for prompt generation so will now see what they make. pic.twitter.com/BvkL3pu3SI
— GooRee (@GooRee) April 3, 2023
In its current iteration, like with its text-to-image generator, Midjourney will create four different text descriptions of an uploaded image. It’s also possible to generate new variations based on a selected description. To upload a photo, users write “/describe” into the text field, and a drag-and-drop upload field appears.
Users can then select one of the generated descriptions and “remix” the uploaded image using the new text prompt. The user can also edit the text prompt, adding a new element of control to the creative process.
Testing Midjourney’s new /describe feature
PetaPixel tested the feature, first using a portrait captured by editor-in-chief Jaron Schneider.
Midjourney’s four generated descriptions are of varying quality.
- young woman in polka dot dress standing on top of hill, in the style of pensive portraiture, lens flares, voigtlander heliar 15mm f/4.5, troubadour style, close up, calarts, simple
- a young woman wearing a black and white polka dot dress standing, in the style of hazy landscapes, pensive portraiture, sun-soaked colors, candid portraiture, tumblewave, troubadour style, close up
- a young woman in black and white checkered dress standing on slope of hill, in the style of anamorphic lens flare, polka dots, california plein air, pensive stillness, sun-soaked colors, candid portraits, delicate
- leigh wyatt | san francisco weddings|daughtersmaidsoflake benson | san francisco, in the style of neo-pop sensibility, gentle lyricism
The first two descriptions are pretty good, especially the second one. It’s interesting that Midjourney described a specific Voigtlander 15mm prime lens, though, for the record, the image was shot with a Tamron 35mm f/1.8 prime. Using the second description to generate a remix leads to pretty impressive results.
Using another image by Schneider, this time a landscape image from Mono Lake in California, Midjourney again generates mostly useful text descriptions, albeit with the wrong location information about Mono Lake.
- monolake, las vegas, utah, united states of america near crystal, in the style of shot on 70mm, mikalojus konstantinas ciurlionis, post processing, 32k uhd, antoni gaudí, hazy landscapes, fenghua zhong
- mono lake at sunset after a rain, in the style of focus stacking, light sky-blue and bronze, 32k uhd, national geographic photo, stock photo, dansaekhwa
- mono lake, utah, in the style of 32k uhd, balanced symmetry, american tonalist, hazy, dreamlike quality, nikon d850, fenghua zhong
- mono lake, california, sunrise photograph 1, in the style of 32k uhd, isolated landscapes, low depth of field
Using the third description as a remix prompt, Midjourney delivered four very realistic new images.
Midjourney’s /describe tool is intriguing, even in its early state. The tool should help creators make more detailed Alt text, captions, and even different AI-generated artwork. While some parts of the descriptions are puzzling, to say the least, they show promise.
Image credits: Jaron Schneider and Midjourney