Meta’s Llama 3.1 is open-source, kind of. Here’s how it could reshape the AI race

Meta today released a trio of new open-source large language models called Llama 3.1, the largest of which may lead to new chatbots that rival ChatGPT. In fact, Meta CEO Mark Zuckerberg believes the company’s Llama powered AI assistant will be more widely used than ChatGPT by the end of this year. 

Llama 3.1 is actually a small family of models–Llama 3.1 405B, 70B, and 8B. (The numbers connote the number of parameters—that is, the neuron-like connection points where calculations are made and weights are applied—used by the models.) The 405B model was trained on a massive amount of data–15 trillion tokens, which represent words or word-parts. The tokens represent web data dating to 2024 (earlier models have been limited in their recency by cut-off dates, sometimes years in the past).

The 405B model was trained using 16,000 of NVIDIA’s H100 graphics processing units. State of the art “frontier” models are trained by processing large amounts of web-scraped, licensed, or synthetically generated text and image data. The new models also have the ability to reach out to other models (via APIs) to tools and knowledge sources such as up-to-date information, math expertise, and coding. 

Developers can download the new Llama models from Meta or from Hugging Face, or access them via major cloud services like AWS, Azure, and Databricks.

Meta calls the 405B version “the world’s largest and most capable openly available foundation model.” The company says the model beats OpenAI’s GPT-4 and GPT-4o, along with Anthropic’s Claude 3.5 Sonnet on commonly used benchmark tests, and “is competitive with” those other models across a range of tasks. Meta believes developers will use its new Llama models to create more agentic chatbots, tools with greater reasoning capabilities, and better computer coding agents. 

The company also uses as examples of Llama 3.1 405B’s power the capacity for “synthetic data generation” and “model distillation.” The former means the ability of one large model to create training data for a smaller model. The latter means the ability of a large model (a “teacher”) to transfer elements of its intelligence to a smaller (“student”) model. Meta says it altered its commercial license agreement to allow for these uses. This could have important implications for how models work together, and economic implications for the return on investment of smaller models. 

But the model will also power some consumer use cases. It now powers Meta’s AI assistant at Meta.ai (for U.S. users, anyway) and within WhatsApp. 

The new models are text-based, not multimodal. But Zuckerberg says in a new video posted on Instagram that his company is working on next-gen models to power multimodal features such as an “Imagine” feature that creates images based on a photo of a person and a prompt (for example, “Imagine me playing soccer”). Zuckerberg says his company is also working on technology that will allow users to create their own AI apps and share them across the company’s social platforms. 

Over the past few years, as the AI race has heated up and attracted billions in investment dollars, companies have grown more and more secretive about how their models are built and how they work. 

Meta says it’s making the model weights publicly available through Hugging Face and a group of technology partners (including Nvidia), along with some new safety tools designed to make sure people don’t prompt the model to do harmful things. 

Open source advocates believe that AI can advance faster and better maintain safety if AI companies develop the burgeoning technology out in the open. Meta has long touted its commitment to open source, but many developers have noted that the company is open about some aspects of its models. 

“Meta is continuing the industry standard of open-washing in AI,” says Nathan Lambert, a machine learning expert who works at The Allen Institute for AI. Lambert says Zuckerberg and Meta’s definition of open-source differs in spirit from the major proposed definitions currently being debated by institutional working groups (which Meta participates in). 

Meta’s definition of “open” seems to permit a lack of information on the data used to train the models. The parameter weights (generated during the model’s pre-training) released with a model are important, but the substance and curation of the training data plays an equal role in the performance of the model, AI researchers have come to believe. “Meta’s release documents detail the data being ‘publicly available’ with no definition or documentation,” Lambert says. 

Scale AI CEO Alexandr Wang says his company, which produces and sculpts synthetic training data, provided a large amount of data used in the fine-tuning and reinforcement learning from human feedback (RLHF) of the new Llama models. 

Others say it’s the terms of Meta’s commercial usage license that fall short. “Meta isn’t open washing (per se) but Meta’s custom license and limits on usage does violate the ethos of open source,” Gartner analyst Arun Chandrasekaran tells Fast Company in an email. 

Despite this, Chandrasekaran believes Llama 3.1 will have real impact for both businesses and consumers. “[T]his will be a very useful model to a large set of enterprise clients,” he says, “and we can also expect Meta to push AI features more aggressively in its consumer products.”

The big picture is that Meta is, first and foremost, a very rich social media company that makes its money selling ads within social feeds. It’s assembled an impressive organization of highly-paid AI researchers that can develop models that help with important parts of Meta’s business, such as content moderation. But it’s also in a position to seed the growing AI ecosystem with its free models and tools, which could benefit both Meta’s influence, and its bottom line, in the future.

https://www.fastcompany.com/91161560/meta-releases-llama3-1-open-source-debate?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Erstellt 7mo | 23.07.2024, 21:30:09


Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

Stripe hits $91.5 billion valuation in latest tender offer

Stripe on Thursday announced a tender offer for employees and shareholders that valued the company at $91.5 billion, nearly 41% higher than

27.02.2025, 22:30:02 | Fast company - tech
7 ways to fight back against spam phone calls

Unwanted phone calls are out of control. Whether it’s a

27.02.2025, 17:40:10 | Fast company - tech
This new bill aims to make presidential meme coins illegal

California Democrat Rep. Sam Liccardo, a freshman congressman who represents Silicon Valley, said he’s surprised the first piece of legislation he’s sponsoring takes aim at President

27.02.2025, 17:40:09 | Fast company - tech
Could OpenAI build the operating system like the one in ‘Her’?

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week 

27.02.2025, 17:40:09 | Fast company - tech
Trump promised to keep spying agencies in check. Then he fired the watchdogs he appointed

President Donald Trump vowed to fight government abuse and introduce more transparency, a stance that might align him with a little-known agency charged with watching over the U.S.’s powerful spyi

27.02.2025, 15:30:03 | Fast company - tech
Meme coins aren’t just harmless fun

For some time, meme coins have occupied a peculiar space in online culture. While there are peopl

27.02.2025, 13:10:06 | Fast company - tech
Yope wants to be your inner circle’s Instagram

Yope is the latest photo-sharing app vying to take on Instagram and TikTok.

The pitch? A hybrid of a private Instagram and a group chat. While WhatsApp and Snapchat allow for group messa

27.02.2025, 10:50:02 | Fast company - tech