How the U.S. chip bans led to a monster called DeepSeek

The Chinese AI company DeepSeek has put the AI industry in an uproar. Denied the most powerful chips thought needed to create state-of-the-art AI models, DeepSeek pulled off some engineering master strokes that allowed the researchers to do more with less. The DeepSeek-V3 and DeepSeek-R1 models the company recently released achieved state-of-the-art performance in benchmark tests and cost much less time and money to train and operate than comparable models.

And the cherry on top: The company’s researchers showed their work—they explained the breakthroughs in research papers and open-sourced the models so others can use them to make their own models and agents.

The main reason DeepSeek had to do more with less is that the Biden administration put out a series of restrictions on chip exports saying that U.S. chipmakers such as Nvidia couldn’t ship the most powerful GPUs (graphics processing units, the go-to chip for training AIs) to countries outside the U.S.

This effort started in October 2022, and has been updated and fine-tuned several times to close loopholes. Biden released an executive order shortly before leaving office further tightening restrictions. DeepSeek apparently played by the rules. It made do with H800 chips the U.S. allowed Nvidia to sell in China, instead of the more powerful H100 that U.S. tech and AI companies use. 

With less powerful chips, the researchers were forced to find ways of training and operating AI models using less memory and computing power. 

The DeepSeek models use a “mixture of experts” approach, which allows them to activate only a subset of the model’s parameters that specialize in a certain type of query. This economizes on computing power and increases speed. DeepSeek didn’t invent this approach (OpenAI’s GPT-4 and Databricks’s DBRX model use it), but the company found new ways of using the architecture to reduce the computer processing time necessary during pretraining (the process in which the model processes huge amounts of data in order to optimize its parameters to correctly respond to user queries).

In DeepSeek-R1, a reasoning model comparable to OpenAI’s most recent o1 series of models (announced in September), DeepSeek found ways of economizing during inference time, when the model is “thinking” through various routes to a good answer. During this process of trial and error, the system must collect and store more and more information about the problem and its possible solutions in its “context window” (its memory) as it works.

As the context window adds more information, the memory and processing power required leaps up quickly. Perhaps DeepSeek’s biggest innovation is dramatically reducing the amount of memory allocated to storing all that data. In general terms, the R1 system stores the context data in a compressed form, which results in memory savings and better speed without affecting the quality of the answer the user sees. 

DeepSeek said in a research paper that its V3 model cost a mere $5.576 million to train. By comparison, OpenAI CEO Sam Altman said that the cost to train its GPT-4 model was more than $100 million.

Since the release of DeepSeek’s V3, developers have been raving about the model’s performance and utility. Consumers are now embracing a new DeepSeek chatbot (powered by the V3 and R1 models), which is now number one on the Apple ranking for free apps. (However, that success has attracted cyberattacks against DeepSeek and caused the company to temporarily limit new user registrations.) 

For the past two years, the narrative in the industry has been that creating state-of-the-art frontier models requires billions of dollars, lots of the fastest Nvidia chips, and large numbers of top researchers. Across the industry and in investment circles that assumption has been challenged. As a result, Nvidia stock fell nearly 17% Monday as investors question their assumptions about the demand for the expensive GPUs. And it’s all happening because a small shop of Chinese researchers knew they’d need some big engineering breakthroughs in order to create state-of-the-art models using less than state-of-the-art chips. 

https://www.fastcompany.com/91267968/how-the-biden-chip-bans-created-a-monster-called-deepseek?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Created 3mo | Jan 28, 2025, 12:40:02 AM


Login to add comment

Other posts in this group

TikTok is obsessed with this investor who bought 30 floors of a Chicago skyscraper

One of the more unique takes on the POV trend on TikTok: “POV: You bought a 100-year-old skyscraper . . . ”

For those unlikely to ever own a skyscraper themselves, TikTok’s Skyscraper Gu

Apr 18, 2025, 5:10:03 AM | Fast company - tech
Instagram launches ‘Blend’ to share personalized Reels with friends

When it comes to sharing Instagram Reels with friends, the process of three taps to get a Reel from A to B can feel surprisingly tedious. Now, Instagram has addressed that issue with its latest fe

Apr 17, 2025, 10:10:04 PM | Fast company - tech
New Jersey is suing Discord for allegedly violating child safety laws

New Jersey filed a lawsuit against Discord on Thursday, alleging that the social platform recklessly exposed children to “harassment, abuse, and sexual exploitation by predators who lurk on

Apr 17, 2025, 10:10:03 PM | Fast company - tech
Google just lost a major ad tech antitrust case. What happens next could rewire the web

Google has acted illegally to maintain a dominant position in online advertising, a federal judge ruled on Thursday. The tech giant’s “exclusionary conduct substantially harmed Google’s publisher

Apr 17, 2025, 7:40:06 PM | Fast company - tech
Nvidia watches its Trump overtures come to naught

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter ever

Apr 17, 2025, 5:30:02 PM | Fast company - tech
The AI starter pack trend is taking over LinkedIn and TikTok

What’s in your office starter pack? La Colombe cold brew and a New Yorker subscription? Bose headphones and Brooks Brothers?

Thanks to the latest ChatGPT trend

Apr 17, 2025, 3:10:07 PM | Fast company - tech
SpaceX is the top contender to build Trump’s ‘Golden Dome’ missile shield, sources say

Elon Musk’s SpaceX and two partners have emerged as frontrunners to win a crucial part of President Donald Trump’s “Golde

Apr 17, 2025, 3:10:07 PM | Fast company - tech