A study has found that training AI image generators with AI images produces bad results.
Researchers from Stanford University and Rice University discovered that generative artificial intelligence (AI) models need “fresh real data” or the quality of the output decreases.
This is good news for photographers and other creators because the researchers found that synthetic images within a training data set will amplify artifacts that make humans look less and less human.
In work led by @iliaishacked we ask what happens as we train new generative models on data that is in part generated by previous models.
We show that generative models lose information about the true distribution, with the model collapsing to the mean representation of data pic.twitter.com/OFJDZ4QofZ
— Nicolas Papernot (@NicolasPapernot) June 1, 2023
In the above graph, posted to X by research team member Nicola Papernot, there is a dramatic fall away from the “true distribution” as the model loses touch with what it is supposed to be synthesizing, corrupted by the AI material within its data set.
AI Models Go MAD
The research team named this AI condition Model Autophagy Disorder, or MAD for short. Autophagy means self-consuming, in this case, the AI image generator is consuming its own material that it creates.
“Without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease,” the researchers write in the study.
What Does it Mean for the Future of AI?
If the research paper is correct, then it means that AI will not be able to develop an endless fountain of data. Instead of relying on its own output, AI will still need real, high-quality images to keep progressing. It means that generative AI will need photographers.
With picture agencies and photographers now very much alive to the fact that their intellectual property assets have been used en-masse to train AI image generators, this technological quirk may force AI companies to license the training data.
Since the likes of DALL-E and Midjourney burst onto the scene a year ago, the companies behind the incredible new tools have insisted that they use “publically available” data to train their models. But that includes copyrighted photos.
Even if they don’t face legal consequences from building the first iterations of AI image generators, for their future models they will most likely need the cooperation of image professionals.