An AI watchdog accused OpenAI of using copyrighted books without permission

An artificial intelligence watchdog is accusing OpenAI of training its default ChatGPT model on copyrighted book content without permission.

In a new paper published this week, the AI Disclosures Project alleges that OpenAI likely trained its GPT-4o model using nonpublic material from O’Reilly Media. The researchers used a legally obtained dataset of 34 copyrighted O’Reilly books and found that GPT-4o showed “strong recognition” of the company’s paywalled content. By contrast, GPT-3.5 Turbo appeared more familiar with publicly accessible O’Reilly book samples.

“These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training,” the authors wrote in the paper. Tim O’Reilly, one of the paper’s authors, is a cofounder and CEO of O’Reilly Media.

An OpenAI spokesperson didn’t immediately respond to Fast Company‘s request for comment.

Training data lies at the heart of all artificial intelligence models. Large language models (LLMs) require an incredible amount of information that it uses to guide back on when it churns out text or images for users.

OpenAI has struck up some licensing deals to be able to train their models on certain content. But the company, which recently fundraised and is worth $300 billion, has also come under fire for sourcing certain content. The New York Times, for example, is leading a charge against OpenAI and minority owner Microsoft over alleged copyright infringement.

The researchers acknowledged limitations in their study but argued that the issue is likely part of a broader systemic problem in how large language models are developed.

“Sustainable ecosystems need to be designed so that both creators and developers can benefit from generative AI,” the authors wrote. “Otherwise, model developers are likely to rapidly plateau in their progress, especially as newer content becomes produced less and less by humans.”


https://www.fastcompany.com/91310223/an-ai-watchdog-accused-openai-of-using-copyrighted-books-without-permission?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Vytvořeno 26d | 2. 4. 2025 20:30:07


Chcete-li přidat komentář, přihlaste se

Ostatní příspěvky v této skupině

Who is Aaron Parnas? He’s the guy breaking news to Gen Z

If you’re not on TikTok, you may not have heard of Aaron Parnas. But for many young people across the U.S., he’s a prominent political news source, with over 3.5 million followers on TikTok and ju

28. 4. 2025 10:40:09 | Fast company - tech
Inside a single day on TikTok: 117 million videos, billions of views

Getting a sense of the scale of social media platforms can be tricky. While tech companies often share self-serving metrics—like monthly active users or how likely users are to buy products after

28. 4. 2025 10:40:08 | Fast company - tech
Is social media hurting teens’ mental health? It’s complicated

Social media is terrible for teens’ mental health—or is it?

At the same time that

28. 4. 2025 6:10:07 | Fast company - tech
3 quick, easy AI chatbot prompts that can help you do your job better

Fun fact: The saying “work smarter, not harder” is coming up on its 100th birthday. Coined

28. 4. 2025 6:10:06 | Fast company - tech
Is Apple falling behind on hardware?

If you’ve followed Apple for any length of time, you’ve no doubt come across the notion that the company doesn’t rush into adopting cutting-

27. 4. 2025 11:30:07 | Fast company - tech
This free audio enhancer will totally transform your voice memos

Every now and then, you run into a tool that truly wows you.

It’s rare—especially nowadays, when everyone and their cousin is coming out with overhyped AI-centric codswallop tha

26. 4. 2025 12:20:10 | Fast company - tech