MegaPortraits: High-Res Deepfakes Created From a Single Photo

High Resolution Avatars

Researchers from Samsung Labs have developed a way to create high-resolution avatars, or deepfakes, from a single still frame photo or even a painting.

Called MegaPortraits, the team says they can create megapixel-sized avatars from simple single-frame starting points. The team says that its advances tackled a particularly challenging problem of when the appearance of the driving image is substantially different from the animated source image. For example, it is much harder to operate an effective deepfake of Angelina Jolie when the subject is a short-haired male.

The reason the deepfakes of Tom Cruise that appeared on TikTok last year were so effective is because the subject had a similar appearance to the famous actor and was able to replicate his mannerisms. It would have been far harder to create a deepfake model of Cruise if the subject looked very different from him.

This method appears to solve that problem and allows anyone to control realistic avatars even if they do not closely resemble the target.

“We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion,” the researchers state in their abstract.

“We show that suggested architectures and methods produce convincing high-resolution neural avatars, outperforming the competitors in the cross-driving scenario.”

The system for making high-resolution human avatars is called megapixel portraits, or MegaPortraits for short. The researchers explain that the model is trained in two stages, but they have also proposed a third additional stage that allows it to work faster.

“Our training setup is relatively standard. We sample two random frames from our dataset at each step: the source frame and the driver frame. Our model imposes the motion of the driving frame (i.e., the head pose and the facial expression) onto the appearance of the source frame to produce an output image,” the team explains.

“The main learning signal is obtained from the training episodes where the source and the driver frames come from the same video, and hence our model’s prediction is trained to match the driver frame.”

Perhaps even more impressive, the team has managed to make this eerily effective system operate in a compressed model that works in real-time.

“We show how a trained high-resolution neural avatar model can be distilled into a lightweight student model which runs in real-time and locks the identities of neural avatars to several dozens of pre-defined source images,” the researchers continue. “Real-time operation and identity lock are essential for many practical applications head avatar systems.”

The multiple results above show the efficacy of the program. The full project titled MegaPortraits: One-shot Megapixel Neural Head Avatars has been published as a scientific paper.