‘Speaking Portrait’ Turns Photos into Eerily Realistic Talking Heads

Sep 27, 2021

Jaron Schneider

D-ID, the company whose tech powers the MyHeritage app, has demonstrated a new use for its technology. Called “Speaking Portrait,” it allows any photo to be animated with uncanny realism and is capable of saying whatever the user wants.

MyHeritage first made headlines in 2020 with its “Photo Enhancer” tool that used artificial intelligence to restore old family photos. Earlier this year, MyHeritage took its work a step further and developed Deep Nostaliga, which is capable of animating people from photographs.

Noted in PetaPixel’s original coverage, the AI at the core of the app was licensed from D-ID which specializes in video reenactment using deep learning. D-ID did not appear satisfied to rest on the Deep Nostalgia laurels, however, as it has demonstrated a new application of its technology that can animate a photo and allow a user to control it in real-time.

As reported by TechCrunch, the result can appear a lot like the “deepfakes” that are growing in accuracy online, but the technology behind a Speaking Portrait is supposedly quite different and making a basic one requires no training. The implementation of this new technology was specifically shown at TechCrunch Disrupt 2021 which concluded last week.

Speaking Portraits allows anyone to generate a full HD video from a source image and can combine that animation with either recorded speech or typed text. D-ID plans to launch the product with support for just three languages — English, Spanish and Japanese — but plans to add other languages as they are requested.

There are two categories of the Speaking Portrait: one is called a “trained character” and requires the submission of a 10-minute training video of the requested character that must coincide with guidelines provided by D-ID. While this one takes a lot of work, it results in a character animation with a lot more fluidity whcih also supports the abiliity to swap out the backgrounds.

Below is an example of an AI-generated newscaster that was made using the “Trained Character” method.

Using 10 minutes of trained footage is reminiscent of technology that was deployed by Hour One in February. In it, users could create a “AI clone” of themselves that, once trained, could talk and move just like the original.

But perhaps more impressive is the other application of D-ID’s technology: “single portrait.” These can be made with any still photo and will animate the head while other parts stay static and can’t have replaced backgrounds. Still, the result below shows how movements and facial expressions performed by the real person are seamlessly added to a still photograph. The human can act as a sort of puppeteer of the still photo image.

D-ID’s co-founder and CEO recognizes that there are potential hazards with this kind of technology and told TechCrunch that he and his company are “keen to make sure it’s used for good, not bad.” While the company will do its best to create assurances as such, it plans to partner with others in the space to try and avoid abuse.