The makers of the AI image generator Stable Diffusion are getting into generative AI video with a new product called Stable Video Diffusion.
There are many different Stable Diffusion AI image generator models and the company, Stability AI, is sticking with that practice by releasing two AI video generator models.
SVD can generate 14 frames of video and SVD-XT can generate 25 frames of video. The frames per second can be customized between three and 30 frames per second at a resolution of 576×1024.
It must be noted that the models are currently only for research purposes and those wanting to use the video diffusion have to contact Stability AI to request waitlist access.
“This state-of-the-art generative AI video model represents a significant step in our journey toward creating models for everyone of every type,” the company writes on its website.
On the company’s Hugging Face page, it admits that Stable Video Diffusion is still somewhat limited.
“The generated videos are rather short (four seconds), and the model does not achieve perfect photorealism,” says Stability AI.
“The model may generate videos without motion, or very slow camera pans. The model cannot be controlled through text. The model cannot render legible text. Faces and people in general may not be generated properly.”
Engadget notes that the text-to-video tool was trained on a dataset of millions of videos which were then fine-tuned on a smaller set.
The source of the data is a tetchy issue; for its AI image generator, Stability AI used the LAION dataset for at least some of its training data which has led to a lawsuit with Getty Images.
For its video model, Stability AI has only said that it used “publicly available” data; a common term that artificial intelligence companies use to describe where they get their training data from.
Stable Video Diffusion will be looking to compete with the likes of Runway ML in a market that hasn’t taken off in the way AI images have.
That’s largely because the quality of the technology is not there yet but it’s likely that it will vastly improve in the coming years and among the obvious benefits for media creation comes with the potential for deepfakes and copyright violations.
Image credits: Stability AI