How Humans Are Teaching Computers To See and Understand Photos

Mar 25, 2015

Three year old children can make sense of what they see in photos and describe them to us, but even the most advanced computers have historically had difficulties with that same task. That’s quickly changing though, as computer scientists are developing powerful new ways to have computers identify what a photograph is showing.

The video above is a new TED talk given by Fei-Fei Li, a Stanford professor who’s one of the world’s leading experts on computer vision. She talks about her revolutionary ImageNet project that has changed how computers “see.”

The project uses a database of 15 million photographs to teach computers to recognize things in pictures. After putting a monumental effort toward building and polishing the collection, Li’s team released the dataset to the world for free, and it has since become one of the industry benchmarks in how well computers can perform in recognition.

Li started out by teaching computers how to identify simple subjects of photos.

Screen Shot 2015-03-25 at 9.29.42 AM

Screen Shot 2015-03-25 at 9.29.48 AM

Screen Shot 2015-03-25 at 9.29.54 AM

Screen Shot 2015-03-25 at 9.29.59 AM

Screen Shot 2015-03-25 at 9.30.08 AM

Screen Shot 2015-03-25 at 9.30.24 AM

The next step is training a computer to describe the scene like a 3 year old would: with sentences instead of a list of words. Some of the results so far have been quite remarkable:

Screen Shot 2015-03-25 at 9.32.53 AM

Screen Shot 2015-03-25 at 9.32.58 AM

Screen Shot 2015-03-25 at 9.33.49 AM

While others show that there’s still quite a bit of work that needs to be done:

Screen Shot 2015-03-25 at 9.33.22 AM

Screen Shot 2015-03-25 at 9.33.27 AM

Screen Shot 2015-03-25 at 9.33.37 AM

Microsoft recently announced that its technology was found to outperform humans in an ImageNet challenge, making mistakes on just 4.94% of photos compared to 5.1% for human test subjects. The computer was able to correctly identify what was shown in a vast majority of sample photos:

sample1