If you have two similar photos of two different people, Photoshopping one face onto the other isn’t very difficult. Change that to two video clips of two people talking, and you have a much more challenging task on your hands. That’s the problem Harvard University computational photography graduate student Kevin Dale decided to tackle. His research project, titled “Video Face Replacement,” introduces a way of doing this “digital face transplant” in a relatively automated way. The demonstration video above shows how effective his technique is at doing the ‘shop seamlessly.
In the abstract of the paper presenting this research, Dale writes,
We present a method for replacing facial performances in video. Our approach accounts for differences in identity, visual appearance, speech, and timing between source and target videos. Unlike prior work, it does not require substantial manual operation or complex acquisition hardware, only single-camera video. We use a 3D multilinear model to track the facial performance in both videos. Using the corresponding 3D geometry, we warp the source to the target face and retime the source to match the target performance. We then compute an optimal seam through the video volume that maintains temporal consistency in the ﬁnal composite. We showcase the use of our method on a variety of examples and present the result of a user study that suggests our results are difﬁcult to distinguish from real video footage.
It’s already difficult to distinguish between “real” and “fake” environments in TV shows, but it may soon be much more difficult to determine whether the person you see talking on screen is actually the actor or actress he or she appears to be.