OpenAI wants to work with organizations to build new AI training datasets

OpenAI is rolling out a new partnership program to collect datasets from third parties that it intends to use to train its AI models. The initiative, OpenAI Data Partnerships, will seek large-scale private and public information that it says is “not already easily accessible online to the public.” The company says the data it will collect doesn't necessarily have to be quantitative or in text formats — the program will also accept images, audio or video.

Notably, the company says it's on the lookout for data on “any topic” and in “any language” so long as it “expresses human intention,” which it likens to long-form essays or transcribed conversations. Human-centric data collected by OpenAI is expected to help the company improve tools like its automatic speech recognition technology which is used to transcribe spoken words. This initiative also lines up with ChatGPT’s recent expansion to support voice queries to engage with users in a conversational manner. Exposing its AI models to more information that teaches it how to hold up human-like conversations will only further improve this feature and other tools that will follow in function.

Announcing OpenAI Data Partnerships — help steer the future of AI by collaborating on public and private datasets with us. https://t.co/4tbi5SZ6sS

— OpenAI (@OpenAI) November 9, 2023

The model testing conducted throughout the data partnership program will also naturally expand the capabilities of OpenAI’s consumer-facing GPT-4 Turbo, which has been updated to provide users with more complex and meaningful responses. OpenAI says it has already started working with interested organizations, including authoritative bodies like the Icelandic government. Through curated datasets, OpenAI says its working to improve GPT-4’s ability to comprehend queries made in the Icelandic language.

If a private or public organization wants to participate in the program, a representative can submit a form on the company’s website and share information on the data type and size that they intend to share. There are two pathways for datasets. The first is the Open-Source archive, which is ideal for datasets relevant to training language models. However, submissions made to it will be public for anyone to use. Alternatively, OpenAI says a company can submit information through its private dataset pathway which will be funneled to train proprietary AI models, which the company says includes their “foundation models” and “fine-tuned and custom models.” This is recommended for companies or institutions that want to keep their data confidential. But in that same regard, OpenAI says it is not looking for datasets that contain sensitive or personal information.

ChatGPT has already set records for its soaring user base. It has about 100 million weekly active users around the world, meaning privacy will only continue to be a focal point for the tool. Previously, Samsung employees were put in the hot seat for leaking sensitive data to the AI model. While OpenAI claims it does not use data generated by its API to train its models unless a user explicitly submits information through an opt-in form, all eyes will be on how the company handles the data collected through this initiative, especially the private datasets.

This article originally appeared on Engadget at https://www.engadget.com/openai-wants-to-work-with-organizations-to-build-new-ai-training-datasets-214548902.html?src=rss https://www.engadget.com https://www.engadget.com/openai-wants-to-work-with-organizations-to-build-new-ai-training-datasets-214548902.html?src=rss
Created 1y | Nov 9, 2023, 11:50:21 PM


Login to add comment

Other posts in this group

Sony raises PlayStation Plus prices in Canada

Sony is jacking up PlayS

Apr 16, 2025, 11:20:09 PM | Engadget
Zoom is back up after outages this afternoon

Zoom went down for many of its users this afternoon. People began experiencing issues with video conferencing service over the past few hours, peaking at more than 60,000 reports on

Apr 16, 2025, 11:20:08 PM | Engadget
American Airlines will provide inflight Wi-Fi for free starting next year

American Airlines has announced plans to finally offer

Apr 16, 2025, 11:20:07 PM | Engadget
Here’s how to watch the Mario Kart-focused Nintendo Direct

There’s yet another Nintendo Direct coming our way, which is the third in less than a month. This one is entirely focused on the

Apr 16, 2025, 8:50:16 PM | Engadget
Samsung Odyssey 3D monitor hands-on: This should be the new baseline for glasses-free 3D

It seems like every few years, gadget makers try to come up with something that will make us care about seeing things in 3D again. Without going all the way back to the

Apr 16, 2025, 8:50:13 PM | Engadget
iOS 18.4.1 patches two iPhone security flaws used in 'extremely sophisticated' attacks

On Wednesday, Apple pushed updates to most of its platforms: iOS 18.4.1, iPadOS 18.4.1, macOS 15.4.1, tvOS 18.4.1 and visionOS 2.4.1. They contain two

Apr 16, 2025, 8:50:11 PM | Engadget