Researchers from Cornell University have announced a state-of-the-art breakthrough in visual machine learning that allows robots to understand objects that are obscured by other objects.
Until now, robots have been able to understand the world around them based on visual information but aren’t able to perceive a great deal when it comes to understanding context. For example, robots have not been able to understand and correctly gauge the correlation of objects that are occluded (or covered up) by other objects. This kind of instance-aware segmentation of unseen objects is crucial for the future of robotic operation and navigation but has thus far proved challenging to teach to machines.
Take, for example, a bowl of cereal. Imagine a table with a bowl of cereal with a spoon in it, the box of the cereal itself, and a cup of coffee.
If a person looked at that setting from across the room, the cup of coffee may be in front of the bowl of cereal. Humans can understand that the cup of coffee is only obstructing the box of cereal and even though it is partially blocked, a person would still know that it is there in its entirety even if it isn’t visible. Humans can also imagine what the part of the box of cereal that isn’t visible from that angle looks like. Robots have lacked this nuanced understanding of scenes, until now.
The researchers state that although previous works achieved encouraging results, they were limited to segmenting the only visible regions of unseen objects.
“For robotic manipulation in a cluttered scene, amodal perception is required to handle the occluded objects behind others,” the researchers say.
The team proposed a what they call a Hierarchical Occlusion Modeling (HOM) scheme that they say is designed to reason about the hidden objects by “assigning a hierarchy to a feature fusion and prediction order.”
In short, once they were able to have the robot understand that objects were obstructed and continued on behind other objects, they were then able to train the robots to remove the foreground objects to reach the desired object behind. They tested their method in three environments: tabletop, indoors, and bins. In many instances, the robot was able to identify objects nearly identically to the “ground truth” control image.
Despite these impressive results, there were some cases of failure, particularly when given complex environments or backgrounds such as the example below:
The implications of this technology may have a significant impact on camera development in the future. While object detection within current systems exists, technologies such as these may push accuracy and tracking capabilities even further.
Image credits: Header photo licensed via Depositphotos.