An AI watchdog accused OpenAI of using copyrighted books without permission

An artificial intelligence watchdog is accusing OpenAI of training its default ChatGPT model on copyrighted book content without permission.

In a new paper published this week, the AI Disclosures Project alleges that OpenAI likely trained its GPT-4o model using nonpublic material from O’Reilly Media. The researchers used a legally obtained dataset of 34 copyrighted O’Reilly books and found that GPT-4o showed “strong recognition” of the company’s paywalled content. By contrast, GPT-3.5 Turbo appeared more familiar with publicly accessible O’Reilly book samples.

“These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training,” the authors wrote in the paper. Tim O’Reilly, one of the paper’s authors, is a cofounder and CEO of O’Reilly Media.

An OpenAI spokesperson didn’t immediately respond to Fast Company‘s request for comment.

Training data lies at the heart of all artificial intelligence models. Large language models (LLMs) require an incredible amount of information that it uses to guide back on when it churns out text or images for users.

OpenAI has struck up some licensing deals to be able to train their models on certain content. But the company, which recently fundraised and is worth $300 billion, has also come under fire for sourcing certain content. The New York Times, for example, is leading a charge against OpenAI and minority owner Microsoft over alleged copyright infringement.

The researchers acknowledged limitations in their study but argued that the issue is likely part of a broader systemic problem in how large language models are developed.

“Sustainable ecosystems need to be designed so that both creators and developers can benefit from generative AI,” the authors wrote. “Otherwise, model developers are likely to rapidly plateau in their progress, especially as newer content becomes produced less and less by humans.”


https://www.fastcompany.com/91310223/an-ai-watchdog-accused-openai-of-using-copyrighted-books-without-permission?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creato 27d | 2 apr 2025, 20:30:07


Accedi per aggiungere un commento

Altri post in questo gruppo

“Hostile and political”: Jeff Bezos should have known Trump was always going to turn against Amazon

Consumers are only just starting to feel pain from Trump’s Liberation Day tariff spree. Amazon

29 apr 2025, 21:30:07 | Fast company - tech
In his first 100 days, Trump’s tariffs are already threatening the AI boom

When Donald Trump returned to the White House in 2025, many in the tech world hoped his promises to champion artificial intelligence and cut regulation would outweigh the risks of his famously vol

29 apr 2025, 16:50:07 | Fast company - tech
How learning like a gamer helped this high-school dropout succeed

There are so many ways to die. You could fall off a cliff. A monk could light you on fire. A bat the size of a yacht could kick your head in. You’ve only just begun the game, and yet here you are,

29 apr 2025, 12:20:08 | Fast company - tech
Renate Nyborg’s Meeno wants to become the Duolingo of dating

Former Tinder CEO Renate Nyborg launched Meeno less than two years ago with the intention of it being an AI chatbot that help

29 apr 2025, 12:20:07 | Fast company - tech
How Big Tech’s Faustian bargain with Trump backfired

The most indelible image from Donald Trump’s inauguration in January is not the image of the president taking the oath of office without his hand on the Bible. It is not the image of the First Lad

29 apr 2025, 12:20:06 | Fast company - tech
Turns out AI is really bad at picking up on social cues

Ernest Hemingway had an influential theory about fiction that might explain a lot about a p

29 apr 2025, 12:20:04 | Fast company - tech