Meta has announced two new AI-powered imaging tools for Facebook and Instagram that will cover both stills and video.
The new features are built on Emu, the artificial intelligence software at the heart of Meta’s AI image offerings.
The first feature is called “Emu Edit” and it will allow users to alter images based on text inputs.
This may sound familiar — similar to Adobe Photoshop’s Generative Fill — but what differentiates Emu Edit is that users don’t have to actually select the element they want to change, they just describe it and the AI will understand the request.
For example, the user can just write “remove the person” and without selecting anything, the person will disappear from the image.
“Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more,” Meta writes in a blog.
“Our key insight is that incorporating computer vision tasks as instructions to image generation models offers unprecedented control in image generation and editing.”
Meta says that it trained Emu on “10 million synthesized samples, each including an input image, a description of the task to be performed, and the targeted output image” and believes it to be the largest dataset of its kind.
“Current methods often lean towards either over-modifying or under-performing on various editing tasks,” Meta adds.
“We argue that the primary objective shouldn’t just be about producing a ‘believable’ image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request.”
The second AI offering from Meta is “Emu Video,” which leverages the same base Emu model to generate video from text prompts and also from still images.
Meta had previously made an AI video generator called Make-A-Video; the company says that Emu Video is a big improvement on Make-A-Video.
“Our state-of-the-art approach is simple to implement and uses just two diffusion models to generate 512×512 four-second long videos at 16 frames per second,” says Meta.
“In human evaluations, our video generations are strongly preferred compared to prior work—in fact, this model was preferred over Make-A-Video by 96% of respondents based on quality and by 85% of respondents based on faithfulness to the text prompt.”
Meta has not said when it will release Emu Edit and Emu Video only that it is “purely fundamental research” for the moment but the “potential use cases are clearly evident.”
Image credits: Courtesy of Meta.