Anthropic’s newest Claude chatbot beats OpenAI’s GPT-4o in some benchmarks

Anthropic rolled out its newest AI language model on Thursday, Claude 3.5 Sonnet. The updated chatbot outperforms the company’s previous top-tier model, Claude 3 Opus, while working at twice the speed. Claude users (including those on free accounts) can check it out beginning today.

Sonnet, which tends to be Anthropic’s most balanced model, is the first release in the Claude 3.5 family. The company says Claude 3.5 Haiku (the fastest in each generation) and Claude 3.5 Opus (the most powerful) will arrive later this year. (Those models will stay on version 3 in the meantime.) The Sonnet update comes only a few months after the arrival of the Claude 3 family, showcasing the breakneck speed AI companies are working to spit out their latest and greatest.

Chart showing benchmarks comparisons between recent AI chatbot models: Claude 3.5 Sonnet, Claude 3 Opus, GPT-4o, Gemini 1.5 Pro and Llama-400b.
Anthropic

Anthropic claims Claude 3.5 Sonnet marks a step forward in understanding nuance, humor and complicated prompts, and it can write in a more natural tone. Benchmarks (above) show the new model breaking industry records for graduate-level reasoning, undergraduate-level knowledge and coding proficiency. It beats OpenAI’s GPT-4o on many of the benchmarks Anthropic published. However, the latest Claude, ChatGPT, Gemini and Llama models tend to score within a few percentage points of each other on most tests, underscoring the tight competition.

The company claims Claude 3.5 Sonnet is also better at interpreting visual input than Claude 3.0 Opus. Anthropic says the new model can “accurately transcribe text from imperfect images,” a skill it hopes will attract customers in retail, logistics and financial services who need to grok data from charts, graphs and other visual cues. 

Claude’s update also brings a new workspace the company calls Artifacts (above). When you prompt the chatbot to generate content like code, text documents or web designs, a dedicated window appears to the right of the chat. From there, you can prompt Claude to make changes, and it will keep the Artifacts window updated with its latest output.

The company sees Artifacts as a first step towards making Claude a space for broader team collaboration. “In the near future, teams — and eventually entire organizations — will be able to securely centralize their knowledge, documents, and ongoing work in one shared space, with Claude serving as an on-demand teammate,” the company wrote in a press release.

Claude 3.5 Sonnet is available now for anyone with an account to try on its website, as well as in the Claude iOS app. (On both of those platforms, Claude Pro and Team subscribers get higher token counts.) You can also access it through the Anthropic API, Amazon Bedrock and Google Cloud’s Vertex AI. It costs $3 per million input tokens and $15 per million output tokens — the same as the previous model.

This article originally appeared on Engadget at https://www.engadget.com/anthropics-newest-claude-chatbot-beats-openais-gpt-4o-in-some-benchmarks-170135962.html?src=rss https://www.engadget.com/anthropics-newest-claude-chatbot-beats-openais-gpt-4o-in-some-benchmarks-170135962.html?src=rss
Creato 3mo | 20 giu 2024, 17:30:04


Accedi per aggiungere un commento

Altri post in questo gruppo

Threads is adding location sharing to posts

Threads seems to be rolling out a new location tagging feature that allows users to add a location to their posts. Some users have reported seeing the change in Threads’ app, though it doesn’t seem

28 set 2024, 00:20:15 | Engadget
OpenAI reportedly plans to increase ChatGPT's price to $44 within five years

OpenAI is reportedly telling investors that it plans on charging $22 a month to use ChatGPT by the end of the year. The company also plans to aggressively increase the monthly price over the next f

28 set 2024, 00:20:13 | Engadget
VR hit Walkabout Mini Golf is getting a mobile edition

Walkabout Mini Golf has been filled with players ever since it launched around the same time as the Meta Quest 2. Now the multiplayer mini-golf game is making the jump to iOS devices.

27 set 2024, 21:50:18 | Engadget
Valve's Deadlock lets you turn cheaters into frogs

Valve is continuing the wonderful tradition of messing with people who feel the need to cheat in multiplayer ga

27 set 2024, 19:40:15 | Engadget
Valve cuts binding arbitration from its Steam user agreement

If you booted up Steam in the last 24 hours, then you probably saw the pop up window asking you to agree to a new

27 set 2024, 19:40:14 | Engadget
Three men charged in connection with the Trump campaign hack

The US Department of Justice charged three Iranian nationals as part of an effort to hack into the emails and computers used by President Donald Trump’s campaign staff and other political connectio

27 set 2024, 19:40:12 | Engadget