Why OpenAI’s Sora has so much trouble depicting gymnasts

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

What gymnasts can teach us about Sora’s limitations

OpenAI made its Sora Turbo image generator available to its paying customers this week, and there’s been a flood of Sora art on social media, much of it very impressive. One thing the model isn’t good at is depicting gymnastics, and the reasons for that say a lot about the current challenges of training video generators. 

Sora has proven to be basically incapable of creating videos of gymnasts, creating instead strange shape-shifting humans that vault through the air and sometimes land on three legs or an extra head. 

The reason for Sora’s failure, according to Menlo Ventures principal investor Deedy Das, lies in its sense of physics. (Das has created some of the best examples on X of Sora’s gymnastic shortcomings.)

Most generative video models are trained to predict new pixels on a screen, Das tells me, based on the pixels that lit up in the previous frames. This may sound similar to the way language models work, and that’s because, unlike other popular video generators, Sora uses the same transformer model architecture that large language models (LLMs) use. And, like with most LLMs, researchers hope that video generators, by guessing what the next set of pixels will be, will begin to form a generalized understanding of how the world works. This means forming an understanding of physics–the rules that govern the way objects move through space. That remains a work in progress for AI researchers.

“Some say that it would work if you throw more gymnastics data at the problem, but it seems unlikely that the right way to ‘learn physics’ is to give enough examples,” Das says. “But the rules of physics apply in so many arbitrary situations that may never be in the data.” 

Asked for its take on the challenge of gymnast videos, an OpenAI spokesperson says in an email that the current Sora model “may struggle to simulate the physics of a complex scene, and may not comprehend specific instances of cause and effect (for example: a cookie might not show a mark after a character bites it).”

With AI progress slowing down, is Google growing more cautious?

This week Google unveiled its newest family of AI models, collectively called Gemini 2.0. But the (biggest, most powerful) flagship Gemini 2.0 model won’t be released until next year. Instead Google this week released an experimental version of a smaller variant of Gemini 2.0 called Flash, and that model is restricted to text-only outputs. Google also announced upgrades to a couple of AI agents—Project Astra and Project Mariner—that will showcase Gemini 2.0’s power, but those remain experimental and are nowhere close to release. 

Google has good reason to be cautious about what it releases. The first wave of LLMs were text-only and had limited reasoning ability and autonomy. Gemini 2.0 represents a second, more agentic wave of models. Because they are multi-modal, autonomous, and can use tools, they are more capable in the hands of the well-intentioned and potentially more dangerous in the hands of bad actors. 

Google isn’t the only one to exhibit an abundance of caution around such technology. Back in May OpenAI showcased a &t=207s">version of ChatGPT that fully utilizes the multimodal capabilities of the company’s GPT-4o model. Like Google’s Project Astra agent it can talk about and reason over things it sees (through a smartphone camera) and hears (through the phone’s microphones) in-real time. Seven months later that product remains in the lab. 

In 2021, Google was in a better position than any other company to bring generative AI to the world. But it delayed releasing new LLM-powered chatbots to the public out of fear that the new technology might violate privacy or be used to spread misinformation or slander. OpenAI beat Google to the punch with ChatGPT in late 2022, and thus began this years-long competition to develop and release new AI tools. Google was under pressure to keep up, despite its safety concerns, because its investors would (and did) punish the company if it didn’t. 

Now there are signs that the big leaps forward in generative AI won’t come from something as simple as  scaling up model size and computing power. “I think the progress is going to get harder . . . the low-hanging fruit is gone,” ">Google CEO Sundar Pichai said recently at The New York Times’s DealBook Summit. “You’re definitely going to need deeper breakthroughs as we get to the next stage into hyper-speed development again.”

My point is not to criticize Google for premature product announcements (why not make your announcement next year when the Gemini 2.0 flagship is ready?), but to suggest that the pressure may be off Google to rush new AI products out the door, at least for now. It’s good that Google was forced off the sidelines by ChatGPT—its lawyers and risk managers shouldn’t have been allowed to keep the company from leading a new computing paradigm that was coming with or without it. But Wall Street shouldn’t be setting the timetable for launching new things either. If there is a slowdown going on, hopefully Google will reset its trust and safety agenda, and be realistic and transparent about when it can actually bring new things to market.

An inconvenient truth about the coming AI revolution?

I recently spoke to futurist Zack Kass, who formerly headed up OpenAI’s go-to-market operation, about the advent of AI and its likely effects on jobs. Kass firmly believes AI will  eventually create many new jobs and lift the economy. “For every job that goes away, there will probably be a new job created in some interesting new way that we just cannot imagine,” he says. But there may be an interim period when humans simply lose jobs to machines, as in the last industrial revolution. 

“The real problem with these industrial revolutions is that on the backside of them there is often incredible amounts of discontent because people, many people, have lost a sense of purpose and identity that they previously found in a job,” Kass says. What does it mean when a graphic designer can no longer believe that they bring some unique, human ability to their job?

“The risk in all this is simply that people will lack purpose, at least for a couple generations,” Kass adds. “I think it will be our cross to bear–figuring out what we do in the world where our work is . . . relatively less meaningful because the world is so much more robust.”

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

https://www.fastcompany.com/91245684/why-openai-sora-has-so-much-trouble-depicting-gymnasts?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creată 13d | 12 dec. 2024, 18:10:03


Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

TikTok is full of bogus, potentially dangerous medical advice

TikTok is the new doctor’s office, quickly becoming a go-to platform for medical advice. Unfortunately, much of that advice is pretty sketchy.

A new report by the healthcare software fi

25 dec. 2024, 00:30:03 | Fast company - tech
45 years ago, the Walkman changed how we listen to music

Back in 1979, Sony cofounder Masaru Ibuka was looking for a way to listen to classical music on long-haul flights. In response, his company’s engineers dreamed up the Walkman, ordering 30,000 unit

24 dec. 2024, 15:10:04 | Fast company - tech
The greatest keyboard never sold

Even as the latest phones and wearables tout speech recognition with unprecedented accuracy and spatial computing products flirt with replacing tablets and laptops, physical keyboards remain belov

24 dec. 2024, 12:50:02 | Fast company - tech
The 25 best new apps of 2024

One of the most pleasant surprises about this year’s best new apps have nothing to do with AI.

While AI tools are a frothy area for big tech companies and venture capitalists, ther

24 dec. 2024, 12:50:02 | Fast company - tech
The future belongs to systems of action

The world of enterprise tech is built on sturdy foundations. For decades, systems of record—the databases, customer relationship management (CRM), and enterprise resource planning (ERP) platforms

23 dec. 2024, 22:50:06 | Fast company - tech
Banning Chinese-made drones could hurt some Americans

Russell Hedrick, a North Carolina farmer, flies drones to spray fertilizers on his corn, soybean and wheat fields at a fraction of what it

23 dec. 2024, 20:40:03 | Fast company - tech