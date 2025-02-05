Watch the short talking head video below. Granted, it is in French, and close inspection of it may raise suspicions but perhaps caught unaware it could well fool people into believing it is a real video and not AI-generated.

The clip is from OmniHuman-1, an AI video system created by ByteDance — the Chinese company behind TikTok — which can deepfake a person using just one photo and one piece of audio.

OmniHuman-1 is just a research paper, for now, but the demos ByteDance is showing off are mightily impressive and appear to be an improvement on other deepfake apps that suffer from uncanny valley syndrome.

Tech Crunch reports that OmniHuman-1 has been trained on 19,000 hours of video content from “undisclosed sources” which you can guarantee means any video ByteDance found on the internet or any other platform — copyrighted or not. The AI tool can also edit existing videos and can change the movements of a person’s limbs. Tech Crunch calls the results “astonishing.”

In the examples below, a woman giving a fake Ted Talk achieves a good level of verisimilitude while an AI Albert Einstein delivers a lecture in front of a chalkboard.

“We propose an end-to-end multimodality-conditioned human video generation framework named OmniHuman, which can generate human videos based on a single human image and motion signals (e.g., audio only, video only, or a combination of audio and video),” the Bytedance researchers write.

“In OmniHuman, we introduce a multimodality motion conditioning mixed training strategy, allowing the model to benefit from data scaling up of mixed conditioning. This overcomes the issue that previous end-to-end approaches faced due to the scarcity of high-quality data. OmniHuman significantly outperforms existing methods, generating extremely realistic human videos based on weak signal inputs, especially audio. It supports image inputs of any aspect ratio, whether they are portraits, half-body, or full-body images, delivering more lifelike and high-quality results across various scenarios.”

Users of OmniHuman-1 will get better results if they use high-quality and high-resolution reference images. It even shared a series of videos showing deepfakes talking with their hands — a part of the body AI imagery notoriously struggles with.

The onset of deepfake technology has worrying implications in the real world: malicious actors try and use AI video to sway voters in elections by posting fake endorsements or besmirching an opposing politician’s name.

In February, a finance worker was scammed into paying $200 million Hong Kong dollars ($25.6 million) to criminals after a virtual meeting with a deepfake impersonator.