Apple released an open-source AI model for image editing, possibly giving users a hint at what’s to come in the company’s upcoming generative AI features.
The tool is a multimodal large language model which, simply put, means it goes beyond merely interpreting text. It combines analysis of text, image, video, audio, and the like to deliver results. Apple’s model is called MLLM-Guided Image Editing (MGIE) and was developed with researchers at the University of California, Santa Barbara. The paper detailing the new tool and what it can do was presented and accepted at the International Conference on Learning Representations, a leading conference on machine learning.
The paper breaks down how Apple’s new model solves one of the trickiest parts of AI implementation: bad user prompts. Many times, users may give an AI model a prompt that seems simple enough to follow, but without another human being on the end, things seem to get lost in translation.
For instance, the paper provides an example prompt accompanying a photo of a pizza asking for it to be made healthier. A person would likely understand the underlying sentiment but for a computer, that could be too vague. Yet, Apple’s MGIE is able to first take the prompt, interpret it, and then turn it into something clearer and more concrete. The new prompt now specifically asks for vegetables to be added to the pizza. And so, a veggie mix with tomatoes and herbs replaces the pepperoni pie.
While the results are fascinating in and of itself, the paper gives information beyond the AI model. It may further provide a very big hint as to what’s to come when Apple launches its own native artificial intelligence features. Recently, Apple CEO Tim Cook said generative AI tools would come within the year. Image editing, with Apple’s focus on photos and native iPhone cameras, always seemed like a good bet for the technology. Apple’s now-published work on an innovative image editing model now gives further credence to such an idea.
Additionally, Apple’s publishing of the paper and, as Venture Beat pointed out, providing the open-source model on GitHub and Hugging Face Spaces, a platform focused on machine learning specifically, pull back the curtain on how the tech giant might attempt to wade into AI waters in a responsible way. After all, Apple has long championed itself as the tech company that cares about privacy, and it just joined a consortium on responsible AI.
Image credits: Header photo licensed via Depositphotos.