Three year old children can make sense of what they see in photos and describe them to us, but even the most advanced computers have historically had difficulties with that same task. That’s quickly changing though, as computer scientists are developing powerful new ways to have computers identify what a photograph is showing.
The video above is a new TED talk given by Fei-Fei Li, a Stanford professor who’s one of the world’s leading experts on computer vision. She talks about her revolutionary ImageNet project that has changed how computers “see.”
Li started out by teaching computers how to identify simple subjects of photos.
The next step is training a computer to describe the scene like a 3 year old would: with sentences instead of a list of words. Some of the results so far have been quite remarkable:
While others show that there’s still quite a bit of work that needs to be done:
Microsoft recently announced that its technology was found to outperform humans in an ImageNet challenge, making mistakes on just 4.94% of photos compared to 5.1% for human test subjects. The computer was able to correctly identify what was shown in a vast majority of sample photos:
“Little by little, we’re giving sight to the machines,” Li says. “First, we teach them to see. Then, they help us to see better.”