Meta, the parent company of Facebook and Instagram, has demonstrated a new artificial intelligence-powered (AI) model that can “cut out” any object in any image or video with a single click. Surprisingly, the company has also made it open source.
The AI is a “Segment Anything Model” or SAM that uses a variety of input prompts to specify to it what to segment and it can then react to that in real-time. As Gizmodo notes in its report, there is an abundance of existing AI-powered clipping or replacing systems already in the market — Adobe Photoshop’s content-aware fill or Apple’s ability to “lift and drop” a subject out of a photo and drop it into a text are two notable examples — but what Meta is proposing here is a bit different and impressive.
Once an image is computed, the AI does a remarkably good job isolating the major objects in an image. For example, in the live demo, users can ask it to show all of what it recognizes as individual objects as an illustration of what the technology is doing. While it can’t necessarily see extremely fine details in a larger image — like each person in a large cityscape — it can pick out the majority of objects with relative ease.
Once a subject is a certain size, the AI is smart enough to see those pieces even if they’re not wholly in focus, too.
While there are a set of images included in the demo, users are also allowed to upload their own samples to try with the system. For example, inputting the James Webb Space Telescope’s photo of the Tarantula Nebula — which is very busy and full of a lot of nondescript spots of light — caused it to struggle. On the flip side, uploading a photo of a pair of bikers from the EOS R6 II review showed a lot more impressive results.
It should be noted that this took just a matter of a few seconds, which adds to the impressive nature of the technology.
“SAM’s advanced capabilities are the result of its training on millions of images and masks collected through the use of a model-in-the-loop ‘data engine.’ Researchers used SAM and its data to interactively annotate images and update the model. This cycle was repeated many times over to improve both the model and the dataset,” Meta explains.
“After annotating enough masks with SAM’s help, we were able to leverage SAM’s sophisticated ambiguity-aware design to annotate new images fully automatically. To do this, we present SAM with a grid of points on an image and ask SAM to segment everything at each point. Our final dataset includes more than 1.1 billion segmentation masks collected on ~11 million licensed and privacy preserving images.”
Meta says that it is capable of outputting multiple masks even if the subject is somewhat ambiguous — which is particularly impressive considering that this is just the power of the foundational model that is designed to demonstrate the technology. Meta says it intends to make the system “promptable” which allows it to receive input from a variety of sources ranging from where a user wearing a VR headset is looking to text descriptions.
The full paper describing the technology has been published to Meta’s AI website. But perhaps most exciting given the company behind it, the software is open source. The full dataset that powers the SAM can be downloaded from Meta as well. It is also available on Github.