Here’s something that will blow your mind: scientists have figured out how to extract audio from images captured with a camera. By looking at the extremely small vibrations captured by a high speed camera, researchers have been able to recreate music and speech from nothing but visual information.
The project was conducted by scientists at MIT, Microsoft, and Adobe, and has pretty crazy implications: in the future, people might be able to eavesdrop on your conversations by pointing a microphone-less camera at a potato chip bag near your feet.
That’s an actual example given by the researchers in the video below, which explains how this technology works:
The basic idea behind the experiment is that sound causes vibrations that can be captured by a camera. These vibrations are so small and insignificant that they cannot be detected by the human eye, just like heartbeats on faces.
“People didn’t realize that this information was there,” says MIT grad student Abe Davis.
Even though humans can’t see this data, computers can detect and process it. Throw in some fancy algorithms, and the magic begins to happen.
By pointing a high speed camera at aluminum foil, a glass of water, earbuds lying on a desk, the leaves of a potted plant, and a potato chip bag (through soundproof glass from 15 feet away), scientists were able to recover music and intelligible speech.
Doing this generally requires cameras with a frame rate faster than the frequency of the audio signal (e.g. 2,000 to 6,000 frames per second), but scientists were able to recover usable audio using an ordinary DSLR camera shooting at 60 frames per second.
The audio quality recovered from the ordinary camera wasn’t very good, but it enough to provide information about the gender of a speaker and the number of speakers in the area.
Davis says that this research is paving the way for what he calls “a new kind of imaging” — the capturing of sound through the camera lens.