During his earnings call for Meta’s fourth quarter results yesterday, Mark Zuckerberg made it clear he will use images posted on Facebook and Instagram to train his generative AI tools with.
While making a veiled reference to OpenAI’s DALL-E and Midjourney, the Zuck contrasted other AI companies’ data sources to his own.
“When people think about data, they typically think about the corpus that you might use to train a model up front,” Zuckerberg says.
“On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”
The Common Crawl dataset was employed by OpenAI to build their wildly popular AI apps. Essentially, Zuckerberg is saying that Meta doesn’t need services like Common Crawl, or LAION-5B — an open-source index of online images and captions — because he already has access to that type of mass data.
Zuck was reporting good news on Thursday with profits tripling in the last quarter as share prices jumped 20 percent today.
He made it clear in his earnings call that the company is investing heavily in artificial intelligence and virtual reality.
On AI, Zuckerberg was bullish saying that he is “playing to win” in a space where the other players include Google, OpenAI, and Microsoft.
Last month, Meta announced a standalone AI image generator to compete with the likes of DALL-E and Midjourney.
Meta has already admitted that it has used what it calls “publicly available” data to train its AI tools with.
Essentially, if you have a public Facebook or Instagram profile where you post photographs, there is a strong chance that Meta is using your work to train its AI image generator tools.
AI training data has been a hot topic of discussion for some time now with photographers and artists expressing displeasure at how AI companies currently operate.
Meta President of Global Affairs Nick Clegg admitted he expects “a fair amount of litigation” to determine whether using copyrighted materials to train an AI is protected by Fair Use.
“Whether creative content is covered or not by existing fair use doctrine… We think it is, but I strongly suspect that’s going to play out in litigation,” he said in September.