A new bill would force companies like OpenAI to disclose their training data

Artificial intelligence companies may have to become a lot more transparent about how they train their models, if a new bill from Rep. Adam Schiff passes in Congress. Schiff has proposed the Generative AI Copyright Disclosure Act, which would require firms like OpenAI to list the copyrighted works they use to build generative-AI systems. The bill comes amid a growing outcry about the burgeoning industry using copyrighted materials to inform their large language models, and it’s the latest in a number of Congressional pushes to regulate the technology and protect human content creators.

“AI has the disruptive potential of changing our economy, our political system, and our day-to-day lives,” Schiff said in a statement. “We must balance the immense potential of AI with the crucial need for ethical guidelines and protections. . . . This is about respecting creativity in the age of AI and marrying technological progress with fairness.”

The bill faces a potential uphill battle in Congress, as there has been plenty of gridlock when it comes to AI legislation. Some opponents worry that regulation would slow down the technology’s pace of expansion, potentially giving countries like Russia and China an advantage. Should it pass, though, here’s what you need to know about it.

Schiff’s bill would require companies to let the government know before they launch an AI system. They’ll also be required to list “all copyrighted works used in building or altering the training dataset for that system.”

Is this bill just for new AI systems?

No. The bill’s rules would be retroactive, requiring generative-AI systems already on the market like OpenAI’s ChatGPT to disclose where they got the information they used to train their models. That’s something companies have been reluctant to discuss in general, particularly amid lawsuits from companies like the New York Times. OpenAI CTO Mira Murati recently raised eyebrows when she claimed ">she was unsure if the company’s Sora tool used data from YouTube, Facebook, or Instagram posts.

How far in advance would AI companies have to comply?

The bill mandates that the list of training model data be submitted at least 30 days before the AI is available to the public. Any substantial changes to the training model post-launch would also need to be reported.

What sort of penalties would AI companies face for noncompliance?

That’s unclear. The Copyright Office would determine how much the companies would be fined and the amounts would depend on the company’s size and whether it has a history of ignoring the Act. Penalties would start at $5,000, and go up from there. The Act does not put a cap on the maximum assessment that can be charged.

Would this prevent AI companies from using copyrighted work?

Not directly, but it could bring some accountability to the table. By listing the copyrighted works used for training, the copyright holders could ensure they gave permission for the use of their content and that they were compensated for that usage.

Schiff’s legislative allies haven’t lined up yet, but in the creative community, there are several big names that are supporting this act. The Recording Industry Association of America has offered its support, as has the Director’s Guild of America, Sag-AFTRA, ASCAP, and many more creative unions. (The support comes after Billie Eilish and 200 music artists signed an open letter critical of AI and calling for an end to the use of AI in music creation.)

“This bill is an important first step in addressing the unprecedented and unauthorized use of copyrighted materials to train generative-AI systems,” said Meredith Stiehm, president of the Writers Guild of America West. “Greater transparency and guardrails around AI are necessary to protect . . . creators.”

https://www.fastcompany.com/91090357/generative-ai-bill-force-companies-like-openai-disclose-data-train-models?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Created 10mo | Apr 10, 2024, 8:30:05 PM


Login to add comment

Other posts in this group

What can we learn from insulin price reductions

The Fast Company Impact Council is a private membership community of influential leaders, experts, executives, and entrepreneurs who share their insights with our audience. Members pay annual

Feb 13, 2025, 3:50:03 AM | Fast company - tech
Why your IoT devices are the weakest link in security

The Fast Company Impact Council is a private membership community of influential leaders, experts, executives, and entrepreneurs who share their insights with our audience. Members pay annual

Feb 13, 2025, 1:30:05 AM | Fast company - tech
Meet the Bop House, the internet’s divisive new OnlyFans hype house

What if the Playboy Mansion was filled with OnlyFans content creators? That’s the pitch for the Bop House, a TikTok page that has gained nearly three

Feb 12, 2025, 11:10:10 PM | Fast company - tech
This Valentine’s Day, don’t fall for romance scams, Meta warns

If your social media suitor seems too good to be true, it might be a scam.

Facebook and Instagram parent company Meta Platforms is urging users to stay vigilant about “

Feb 12, 2025, 11:10:08 PM | Fast company - tech
‘I will never recover from this’: The internet is spiraling over the Duolingo owl’s untimely death

Duo, the infamous Duolingo owl, is dead. 

The language-learning app shared the news in a tongue-in-cheek post yesterday. The cause of death remains under investigation, but Duolingo

Feb 12, 2025, 8:50:05 PM | Fast company - tech
Hate speech dramatically increased on X under Elon Musk’s watch, researchers say

Hate speech on X dramatically increased during the several months that Elon Musk served as CEO when compared to the prior months, according to a new study.

The

Feb 12, 2025, 8:50:04 PM | Fast company - tech
SoftBank reveals $2.4 billion loss in Q3

Japanese technology company SoftBank Group Corp. reported a 369.2 billion yen ($2.4 billion)

Feb 12, 2025, 6:30:10 PM | Fast company - tech