Gentrace makes it easier for businesses to test AI-powered software

As businesses continue to integrate generative AI into their products, many find it challenging to actually test whether the AI is behaving correctly and giving useful answers.

To help address this problem, a startup called Gentrace offers an integrated platform for testing software built around large language models. Whereas traditional software that can be subjected to automated tests to verify that, say, data submitted to a web form ended up properly formatted in a database, AI-powered software often can’t be expected to behave exactly in a specified way in response to input, says Gentrace cofounder and CEO Doug Safreno.

Customers can end up defining a set of test data for the AI after any changes to the AI model, the databases it interacts with, or other parameters. But without a testing platform, running those tests can mean maintaining spreadsheets of AI test prompts and manually logging that they give satisfactory results. And while automation is possible, verifying that an AI response contains certain keywords or even asking another AI system to confirm that an AI looks satisfactory, complex testing often requires engineers to be heavily involved, even if other team members like product managers might know better what good output looks like, Safreno says.

“The problem becomes, nobody can look at it and collaborate on these tests and on these evaluation methods,” he says. “As new product requirements come in, they’re not being captured in the testing.”

To help make AI testing more accessible, Gentrace’s platform enables anyone within a company to see, edit, and run tests for LLM-powered systems. The results can then be graded by human evaluators, simple programs, or even more LLMs. Gentrace provides guidance on using LLMs efficiently to test AI output, which Safreno says often involves giving the testing LLMs an “unfair advantage”—providing them more detail of the desired output than the original prompt. But the tool also provides an interface for prompting human evaluators to consider an AI response.

Anna Wang, head of AI at AI-powered workforce training company Multiverse, says Gentrace’s system eliminated the need to pass around documents of AI input and output to evaluate the system’s performance.

“What this replaced were tons and tons of spreadsheets,” she says. “Gentrace has this slick UI that plugs straight into our AI code.”

And as of Tuesday, Gentrace is offering a new feature called Experiments that gives users even more power to test entire applications from the Gentrace interface. With Experiments, users can specify parameters for a test run like data sets to access, prompts to AI systems, and database configuration settings. With simple initial tweaks to their code, developers can mark particular variables as editable within Gentrace, and teammates with no coding knowledge can then set them as desired for a particular test run. Test reports within Gentrace log what’s already been tried in prior tests and how the software performed.

“We just wrap, end-to-end, your application, no matter how you’ve architected it, which means we can measure the impact of any change,” says Safreno. “You could have 20 models chained together, generating an output, and you could tweak one prompt along the way, and we could measure the impact of that.”

The company also on Tuesday announced an $8 million Series A funding round led by Matrix Partners, with additional participation from Headline and K9 Ventures. The new investment will fund additional product development, which Safreno says may one day enable AI—as well as humans—to design tests for LLM-powered applications, like searching through potential prompts or other settings to find the best performing options for an app, or generating new test cases to evaluate performance.

Future versions of Gentrace Experiments will likely also include the ability to experiment with different potential settings, then directly deploy the best-performing options to live code. But even the current version is likely to make AI development more efficient, Safreno says, by reducing the amount of engineer time and coordination required to run basic tests.

“It’s taking out this enormous loop between multiple stakeholders that just doesn’t need to exist,” he says.

https://www.fastcompany.com/91243257/gentrace-makes-it-easier-for-businesses-to-test-ai-powered-software?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Établi 2mo | 10 déc. 2024 à 19:40:08

Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

This app combines Wikipedia and TikTok to fight doomscrolling

“Insane project idea: all of wikipedia on a single, scrollable page,” Patina Systems founder Tyler Angert posted on X earli

10 févr. 2025 à 22:30:04 | Fast company - tech

Elon Musk’s $97 billion OpenAI bid would give the DOGE chief even more power in the AI race

A group of investors led by Elon Musk has given OpenAI an unsolicited offer of $97.4 billion to buy the nonprofit part of OpenAI. An attorney for the group submitted the bid to OpenAI Monday, the

10 févr. 2025 à 22:30:04 | Fast company - tech

What exactly is the point of the AI Action Summit?

The world’s leading minds in AI are gathering in Paris for the AI Action Summit, which kick

10 févr. 2025 à 20:10:10 | Fast company - tech

NASA astronauts are streaming live on Twitch from space. Here’s how to watch

Ever wondered what life is like for an astronaut? Now you can ask during NASA’s first

10 févr. 2025 à 20:10:08 | Fast company - tech

Credo AI’s vision for ethical and transparent AI governance

Brendan Vaughn, editor-in-chief of ‘Fast Company,’ interviews Credo AI’s CEO on AI governance trends at the World Economic Forum 2025.

https://www.fastcompany.com/91275783/credo-ais-vision-fo

10 févr. 2025 à 17:50:05 | Fast company - tech

AI summit in Paris brings together Big Tech and U.S. VP Vance

Major world leaders are meeting for an AI summit in Paris, where challenging diplomatic talks are expected as tech titans fight for dominance in the

10 févr. 2025 à 17:50:03 | Fast company - tech

Roblox joins $27 million industry nonprofit to support online safety

A group of internet businesses, including Roblox, Google, OpenAI, and Discord, have cofounded a nonprofit called Robust Open Online Safety Tools (ROOST).

The new organization will fund f

10 févr. 2025 à 15:30:08 | Fast company - tech

Tomas_r2