An OpenAI ‘open’ model shows how much the company—and AI—has changed in two years

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here

OpenAI says it will release an open-source model–but why now? 

OpenAI CEO Sam Altman said Monday that his company intends to release a “powerful new open-weight language model with reasoning” in the next few months. That would mark a major shift for a company that has kept its models proprietary and secret since 2019. The announcement wasn’t a total surprise: After the groundbreaking Chinese open-source model DeepSeek-R1 showed up in January, Altman said during a Reddit AMA that he realized his company was “on the wrong side of history” and suggested an OpenAI open-source model was a real possibility.

Open models typically come with a permissive license that requires little or no payment to the model developer. Open-weight models can be more cost-effective for corporations trying to leverage AI since they allow businesses to host (and secure) the models themselves—avoiding the often risky prospect of sending proprietary data through an API to a third-party provider and paying fees to do it. More businesses are moving in this direction—especially those holding sensitive user data in regulated industries.

The catch: A corporate user doesn’t have to pay to use the open model. Some AI labs release open models to gain credibility in the market—potentially paving the way to eventually sell API access to their more powerful closed models. By releasing open models early on, the French AI company Mistral established itself as a top-tier AI lab and a legitimate alternative to U.S. players. Some AI labs release open-source models, then earn consulting fees by helping large enterprises deploy and optimize the models over time. 

Meta’s Llama models are the most widely deployed “open” models—though the company restricts reuse and redistribution and keeps the training data and code secret, meaning they are not by definition open source. Meta had different reasons for giving away its models. Unlike Mistral and others, it makes money by surveilling users and targeting ads—not by renting out AI models. Zuckerberg continues funding Llama research because the models are a disruptive force in the industry and earn Meta the right to be called an “AI company.” 

OpenAI now has its own reasons for releasing an open-weight model. Eighteen months ago, OpenAI was the undisputed champion of state-of-the-art AI models. But in the time since, the release of LLMs like Google’s formidable Gemini 2.0 and DeepSeek’s open-source R1 have cracked the competition wide open. 

The market has changed, and OpenAI itself has evolved. Like Meta, OpenAI doesn’t depend directly and solely on its models for its revenue. Selling access to its models via an API is no longer the company’s main source of revenue. Now, most of its revenue, not to mention its staggering $300 billion valuation, comes from selling subscriptions to ChatGPT (most of them to individual consumers). OpenAI’s real superpower is being a household-name consumer AI brand.

OpenAI will definitely continue pouring massive resources into developing ever-better models, but its main reason for doing so isn’t to collect rent from developers for direct access to them, but rather to continue making ChatGPT smarter for consumers. 

AI video generation is getting scary good

AI-video-generation tools are rapidly leaping over the uncanny valley, making it increasingly difficult for everyday internet users to distinguish between real and generated video. This could bode well for smaller companies looking to produce glossy, creative, or ambitious ads at a fraction of the normal cost. But it could spell bad news if bad actors use the technology in phishing scams or to spread disinformation. It’s also yet another threat to the film sector’s livelihood. 

The issue is back in the spotlight following several  announcements, starting with Runway’s  

release of its new Gen-4 video-generation system, which the company says produces “production ready” video. 

AI startup Runway says the new system of models understands “much of the world’s physics” (a claim supported by this video of a man being overtaken by an ocean wave). The company also touts improvements in video consistency and realism, as well as user control during the generation process. Runway posted a demo video of Gen-4’s control tools, which makes the production process look pretty easy, even for non-technicals). Some of the samples of finished videos posted on X look somehow more real than real (see Jean Baudrillard, Simulacra and Simulation). 

Runway faces some stiff competition in the AI video space in the form of perennial contenders including Google’s Veo 2 model, OpenAI’s Sora, Adobe Firefly, Pika, and Kling

A new math benchmark aims to beat test question “contamination”

People in the AI community have been debating for some time whether our current methods of testing models’ math skills are broken. The concern is that while existing math benchmarks contain some very hard problems, those problems (and their solutions) tend to get published online pretty quickly. This of course makes the problem-solution sets fair game for AI companies sweeping up training data for their next models. The worry is that, come evaluation time, the models may have already encountered the test problems and answers in their training data. 

A new benchmark called MathArena was designed to eliminate those issues. MathArena takes its math problems from very recent math competitions and Olympiads, which have obvious incentives to keep their problems secret. The researchers from MathArena also created their own standard method of administering the evaluation, meaning the AI model developers can’t give their own models an edge via changes to the evaluation setup. 

MathArena has just released the results of the most recent benchmark, which includes questions from the 2025 USA Math Olympiad. Here’s one of the questions: “Let H be the orthocenter of the acute triangle ABC, let F be the foot of the altitude from C to AB, and let P be the reflection of H across BC. Suppose that the circumcircle of triangle AFP intersects line BC at two distinct points, X and Y. Prove that C is the midpoint of XY.” Ouch. And to make matters worse, the test requires not only the correct answer but a description of each reasoning step the model took along the way.

The results are, well, ugly. Some of the most powerful and celebrated models in the world took the test, and none scored above 5%. The top score went to DeepSeek’s R1 model, which earned a 4.76%. Google’s Gemini 2.0 Flash Thinking model scored 4.17%. Anthropic’s Claude 3.7 Sonnet (Thinking) scored 3.65%. OpenAI’s most recent thinking model, o3 mini, scored 2.08%.  

The results suggest one of several possibilities: Maybe MathArena contains far harder questions than other benchmarks, or LLMs aren’t great at explaining their reasoning steps, or earlier math benchmark scores are questionable because the LLMs had already seen the answers. Looks like LLMs still have some homework to do.

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

https://www.fastcompany.com/91310415/an-openai-open-model-shows-how-much-the-company-and-ai-has-changed-in-two-years?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Utworzony 21d | 3 kwi 2025, 17:20:11


Zaloguj się, aby dodać komentarz

Inne posty w tej grupie

AI coding tools could bring us the ‘one-employee unicorn’

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week 

24 kwi 2025, 18:40:03 | Fast company - tech
Bot farms invade social media to hijack popular sentiment

Welcome to the world of social media mind control. By amplifying free speech with fake speech, you can numb the brain into believing just about anything. Surrender your blissful ignorance and swall

24 kwi 2025, 13:50:11 | Fast company - tech
The economic case for saving human jobs

Few periods in modern history have been as unsettled and uncertain as the one that we are living through now. The established geopolitical order is facing its greatest challenges in dec

24 kwi 2025, 13:50:11 | Fast company - tech
Patreon’s rivalry with Substack is growing. Who will win over creators?

Substack and Patreon are vying to become creators’ primary revenue stream.

For most influencers, payouts from platforms like Meta or Google aren’t enough to build a sustainable career. R

24 kwi 2025, 11:40:04 | Fast company - tech
TikTok’s ‘SkinnyTok’ trend is under fire from EU regulators

The European Commission is coming for “SkinnyTok.”

EU regulators are investigating a recent wave of social media videos that promote extreme thinness and “tough-love” weight loss advice,

24 kwi 2025, 00:10:04 | Fast company - tech
The subreddit r/AITA is headed for the small screen

The infamous “Am I The A**hole?” subreddit is making its way to the small screen.

Hosted by Jimmy Carr, the new game show for Comedy Central U.K. will feature members of the public appea

23 kwi 2025, 19:30:03 | Fast company - tech
Ex-OpenAI workers ask state AGs to block for-profit conversion

Former employees of OpenAI are asking the top law enforcement officers in California and Delaware to s

23 kwi 2025, 17:10:06 | Fast company - tech