YouTube CEO Says it is a Problem if OpenAI Scraped Videos for Sora

YouTube CEO Neal Mohan
YouTube CEO Neal Mohan. | Collision Conference

YouTube has warned OpenAI that using videos from its platform to train forthcoming AI video generator Sora would break its rules.

Speaking to Bloomberg, YouTube CEO Neal Mohan said that taking videos from YouTube to train an AI model would be an infraction of the platform’s terms of service (ToS).

“From a creator’s perspective, when a creator uploads their hard work to our platform they have certain expectations,” says Mohan. “One of those expectations is that the terms of service are going to be abided by.”

Mohan explains that some YouTube content is scrapable for open web purposes but that video transcript and footage are not allowed to be scraped. “That is a clear violation of our ToS so those are the rules of the road in terms of content on our platform,” he adds.

It comes after OpenAI’s CTO Mira Murati said that Sora was trained by “publicly available” data; an opaque term that could include YouTube videos.

“We used publicly available data and licensed data,” Murati told the Wall Street Journal.

“So videos on YouTube?” the reporter asked. “Videos from Facebook, Instagram? What about Shutterstock? I know you guys have a deal with them.”

“I’m actually not sure about that. If they were publicly available, publicly available to use, there might be that data, but I’m not sure. I’m not confident about it,” Murati responded.

“I’m just not going to go into the details of the data that was used, but it was publicly available or licensed data.”

Sora and Training Data

Sora is yet to be released but OpenAI has shared several previews including giving it to visual artists to make short films with and showing off an AI-generated music video.

The debate around large generative AI models and training data is one that won’t go away with Getty Images still embroiled in a lawsuit against Stability AI, the makers of Stable Diffusion, for using 12 million of its copyright photos.

But despite accusations of copyright theft on a scale that has never been seen before, reports that generative AI companies are still desperate for data to build their machines have circulated in the media.