Runway’s first artificial intelligence (AI) software, Gen-1, was able to make new videos using data from existing videos. While impressive, it’s new Gen-2 software can create full videos entirely from text descriptions in a huge leap for the technology.
The company has been working on this model since last September. Now, Runway Gen-2 is the first publicly available tex-to-video model on the market and is able to “realistically and consistently” sysntheize new videos. Gen-2 combines Gen-1’s features — which was able to apply the composition and style of an image or text prompt to the structure of a source video to make a new video (which Runway calls video-to-video) — or is able to create entirely new video content from only text descriptions.
Generate videos with nothing but words. If you can say it, now you can see it.
Introducing, Text to Video. With Gen-2.
— Runway (@runwayml) March 20, 2023
The web-based platform is able to generate relatively high-resolution (compared to what is available on the market, anyway) videos that, while not photorealistic, do show the power of the technology. For example, the company says that a text prompt of just “An aerial shot of a mountain landscape” was able to create the below:
The company was also able to make the short video below based on the simple prompt, “a close up of an eye.”
“Deep neural networks for image and video synthesis are becoming increasingly precise, realistic, and controllable. In a couple of years, we have gone from blurry low-resolution images to both highly realistic and aesthetic imagery allowing for the rise of synthetic media,” the company says.
“Runway Research is at the forefront of these developments and we ensure that the future of content creation is both accessible, controllable and empowering for users. We believe that deep learning techniques applied to audiovisual content will forever change art, creativity, and design tools.”
While these clips can’t seamlessly replace actual videos yet, the advancement from where the models started shows that they very soon may be able to, especially if the developments follow the same general track that text-to-image generators like Midjourney have. For example, just last year Midjourney was not able to reliably make images that could pass as actual photos, but with the launch of version 5 last week, that changed.
Below are a couple of more examples. First, a “sunset through a window in a New York apartment:”
And finally, “Drone footage of a desert landscape:”
It’s worth noting that while Runway is the first to bring its tech to the public, it’s not working on this technology alone. Google, for example, has been toying with text-to-video generation for some time. Just as there are many players in the text-to-image space, text-to-video is likely to see many competitors crop up quickly as the technology advances.
Image credits: Runway