Researchers Use Deep Learning to Add High-Quality Motion to Still Photos

Jun 16, 2021

Jaron Schneider

Researchers at the University of Washington have developed a new deep learning method that essentially creates high-quality cinemagraphs automatically. The team says the method can animate any flowing material, including water, smoke, fire, and clouds.

The researchers say that the method involves a neural network that needed to be trained to predict the future and animate what a state of flowing matter would look like based on a single still photo. They were able to estimate this by training the neural network on thousands of videos of waterfalls, rivers, oceans, and other materials with fluid motion. According to the University of Washington, the training process consisted of showing the network these videos and then asking it to guess the motion of a video based only on the first frame.

From there, the network was able to learn based on an image’s context clues what would the motion was supposed to look like. Its output was compared to the actual video, and the network slowly learned what to expect from different states of flowing matter.

Original still image of the Snoqualmie Falls | Sarah McQuate/University of Washington

Initially, the team tried to use a method known as “splatting” in order to animate a photo, which is a term that refers to moving each pixel according to a predicted motion. Unfortunately, this method had a problem.

“Think about a flowing waterfall,” lead author Aleksander Hołyński, a doctoral student in the Paul G. Allen School of Computer Science & Engineering, said. “If you just move the pixels down the waterfall, after a few frames of the video, you’ll have no pixels at the top!”

Original image of the Palouse Falls | Sarah McQuate/University of Washington

To address the issue, the researchers created what they call “symmetric splatting,” which is a method that essentially predicts the future for the past image and combines them into one animation.

“Symmetric splatting” | Hołyński et al./CVPR

“Looking back at the waterfall example, if we move into the past, the pixels will move up the waterfall. So we will start to see a hole near the bottom,” Hołyński said. “We integrate information from both of these animations so there are never any glaringly large holes in our warped images.”

From there, the researchers designed the system to create a clean, simple loop that would allow the animated still image to move endlessly. The method works best with objects that have a predictable fluid motion.

The current method does not quite understand how to predict reflections on moving water or how water might distort objects below the surface. These issues, however, are the same ones that plagued early cinemagraphs as well. The difference here, however, appears to be a much more believable state of motion of water than can be created with software tools like Flixel.

“When we see a waterfall, we know how the water should behave. The same is true for fire or smoke. These types of motions obey the same set of physical laws, and there are usually cues in the image that tell us how things should be moving,” Hołyński says. “We’d love to extend our work to operate on a wider range of objects, like animating a person’s hair blowing in the wind. I’m hoping that eventually the pictures that we share with our friends and family won’t be static images. Instead, they’ll all be dynamic animations like the ones our method produces.”

Caption: Palouse Falls
Credit: Sarah McQuate/University of Washington

The team has shared several examples of different fluids moving using the new deep learning algorithm, and the main difference between its results and cinemagraphs is not only a better expression of motion but a reduced perception of when the animation loops. The team hasn’t made it clear what they intend to do with the technology, but they will present the approach at the Conference on Computer Vision and Pattern Recognition on June 22.