New Bill Would Require AI Companies to Disclose Copyrighted Training Data

Rep. Adam Schiff
Rep. Adam Schiff

A new bill introduced by Rep. Adam Schiff would require all AI companies to disclose the copyrighted works used in training sets or face a fine.

At present, the transparency of what exactly is used to train generative AI — specifically image generators — is murky at best. Few companies explicitly state what was used to train data and even when that information is shared, it is often obscured or poorly explained.

For example, during an interview with the Wall Street Journal last month, OpenAI’s CTO Mira Murati feigned ignorance or dodged the question entirely when it was posed to her directly.

“We used publicly available data and licensed data… If they were publicly available, publicly available to use, there might be that data, but I’m not sure. I’m not confident about it,” Murati said at the time.

“I’m just not going to go into the details of the data that was used, but it was publicly available or licensed data.”

OpenAI has not clarified further since that interview, but if Rep. Schiff’s bill were to pass, she, OpenAI, and every other generative AI company would be forced to provide that information to the public.

As reported by Billboard, the Generative AI Copyright Disclosure Act would be retroactive and would also encompass any new generative AI systems released in the future.

“AI has the disruptive potential of changing our economy, our political system, and our day-to-day lives. We must balance the immense potential of AI with the crucial need for ethical guidelines and protections,” Schiff says in a statement on his website.

“My Generative AI Copyright Disclosure Act is a pivotal step in this direction. It champions innovation while safeguarding the rights and contributions of creators, ensuring they are aware when their work contributes to AI training datasets. This is about respecting creativity in the age of AI and marrying technological progress with fairness.”

The bill would require a notice to be submitted to the Register of Copyrights for all current and future generative AI systems and must include all copyrighted works used in the building or altering of the training dataset for that system.

The financial penalty for non-compliance would be determined on a case-by-case basis by the Copyright Office and would depend on factors such as a history of noncompliance and the company’s size.

Image credits: Header photo licensed via Depositphotos.