Here’s Why AI Companies Think They Can Use Photographers’ Work Without Compensation
After the U.S. Copyright Office put out a call for opinions on how copyright should work with AI-generated material, the world’s biggest AI companies had a lot to say.
Unsurprisingly, Google, Meta, OpenAI, and Stability AI do not believe that they should be compensating photographers and artists whose work they used to train their generative AI tools.
The close date for the public consultation has now passed (October 18) and the nearly 10,000 comments are available to view online.
As noted by The Verge, there is some difference in the companies’ opinions as to why they shouldn’t have to pay to train their AI models but ultimately it’s just that: They don’t think they should pay.
OpenAI – Makers of DALL-E and ChatGPT
“OpenAI believes that the training of AI models qualifies as a fair use, falling squarely in line with established precedents recognizing that the use of copyrighted materials by technology innovators in transformative ways is entirely consistent with copyright law…
“When a model is exposed to a large array of images labeled with the word “cup”, it learns what visual elements constitute the concept of “cup-ness”, much like a human child does. It does this not by compiling an internal database of training images, but rather by abstracting the factual metadata that correlates to the idea of “cup”. This enables it to then combine concepts and produce a new, entirely original image of a “coffee cup,” or even “a coffee cup that is also a portal to another dimension.
“The factual metadata and fundamental information that AI models learn from training data are not protected by copyright law. Copyright law does not protect the facts, ideas, scènes à faire, artistic styles, or general concepts contained in copyrighted works. And when technical realities require that copyrighted works be reproduced in order to extract and learn from these unprotectable aspects of a work, courts have routinely found those reproductions to be permissible under the fair use doctrine.”
“If training could be accomplished without the creation of copies, there would be no copyright questions here. Indeed that act of “knowledge harvesting,” to use the Court’s metaphor from Harper & Row,31 like the act of reading a book and learning the facts and ideas within it, would not only be non-infringing, it would further the very purpose of copyright law. The mere fact that, as a technological matter, copies need to be made to extract those ideas and facts from copyrighted works should not alter that result.”
Stability AI — Makers of Stable Diffusion
“Models learn behaviors, they do not store works. Through training, these models develop an understanding of the relationship between words, concepts, and fundamental visual, textual, or musical features. The model doesn’t rely on any single work in the training data, but instead learns by observing recurring patterns over vast datasets (billions of image and caption pairs, and hundreds of billions or trillions of words). The model does not store the material in this training data. They do not “collage” or “stitch” together original works, nor do they operate as a “search engine” for existing content…
“A range of jurisdictions including Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia, and Israel have reformed their copyright laws to create safe harbors for AI training that achieve similar effects to fair use.15 In the United Kingdom, the Government Chief Scientific Advisor has recommended that “if the government’s aim is to promote an innovative AI industry in the UK, it should enable mining of available data, text, and images (the input) and utilise [sic] existing protections of copyright and IP law on the output of AI.”
Meta
The process of training and developing AI models does not necessarily trigger the rights that copyright exists to protect…
“The extraction of unprotectable facts and ideas from copyrighted works is not itself an infringement of copyright, whether that extraction is accomplished by a human being (by, for example, learning from a book) or by a technological process…
“The American AI industry is built in part on the understanding that the Copyright Act does not proscribe the use of copyrighted material to train Generative AI models. That understanding flows directly from the fact that model training is a quintessentially non-exploitive use of training material. As explained above, the purpose and effect of training is not to extract or reproduce the protectable expression in training data, but rather to identify language patterns across a broad body of content. Doing so does not implicate any of the legitimate rightsholder interests that copyright law exists to protect.
“Imposing a first-of-its-kind licensing regime now, well after the fact, will cause chaos as developers seek to identify millions and millions of rightsholders, for very little benefit, given that any fair royalty due would be incredibly small in light of the insignificance of any one work among an AI training set.”