As AI voices become more human, will stereotypes follow?

Google has worked hard to make its new Gemini AI assistant sound more human, but could that lead to people projecting race and gender biases on what they hear?

When Google prepared to give its new Gemini AI assistant the ability to speak, the company decided to name its 10 voice options after celestial bodies. Voices like Orbit, Vega, and Pegasus aren’t just a nod to Gemini’s own constellation branding heritage, but also a way to sidestep preconceived notions around gender.

“We wanted to avoid gendered voices,” explains Françoise Beaufays, Google’s senior director of speech for Gemini Live. “If you look at the settings, we never make statements about gender.”

It’s a laudable approach, but it also doesn’t stop Gemini’s users from anthropomorphizing the AI assistant in their mind’s eye. And as AI assistants increasingly sound like humans, one has to wonder: Are out-of-this-world names really enough to prevent us from projecting our own biases about race and gender on them?

Trying to avoid the Alexa trap

Ever since tech companies launched their first voice assistants, they have grappled with gender stereotypes. The first versions of Alexa, Siri, and the Google Assistant all used female-sounding voices by default, leading to criticism that these assistants were playing into existing notions around women as subservient helpers.

“Many of these [assistants] are not necessarily breaking free of the external stereotypes that exist in our society,” says Nicol Turner Lee, director of the Brookings Institution’s Center for Technology Innovation. “They’re just replicating some of the very same issues that we have.”

Assistants like Alexa have long been telling their users that they have no gender, but the general public clearly sees them as female—and also has a hunch of how problematic that is. After Amazon released its Echo smart speaker in 2014, far fewer parents named their baby girls Alexa, in part to avoid a name that was associated with getting ordered around. 

Rapid advances in artificial intelligence in recent years have made assistants a lot more conversational. Google’s Gemini, for instance, sounds much more like a person than a robot, down to its ability to have what passes as insightful conversations about complex issues.

When asked about the tension between AI voices and gender stereotypes, Gemini had this to say: “The short answer is, it’s tricky. On one hand, you want your AI assistant to sound friendly and approachable, and sometimes that means using voice characteristics that might lean towards traditional gender stereotypes. On the other hand, you don’t want to reinforce those stereotypes either. One possible solution is to offer a wider range of voices so people can pick one they feel comfortable with.”

How Google built and chose Gemini’s voices

Unsurprisingly, Beaufays agrees—and she knows firsthand how tricky selecting those choices can be. When her boss asked her to develop a range of voices for Gemini, the request was simply to make those voices sound “awesome.”

“That was a little scary,” admits Beaufays. Google’s prior assistant voices were optimized for clarity and easy-to-digest instructions, not moments of awe. “We had to really rethink [them] from scratch,” she says.

The company developed a new voice generation technology based on large language models, and then spent countless hours in professional recording studios to capture speech samples from a variety of voice actors. What followed was a long trial-and-error phase of attempting to turn those recordings into AI models. “So many of the models we trained we threw in the bin immediately,” Beaufays says.

The final selection of voice choices was done, in part, with diversity in mind. “We had this hunch that voices are very personal,” says Beaufays. “If we built [only] two great voices, it may not be the two that matter to a specific person.” Instead, the Gemini team decided to offer 10 voices total, with a variety of pitches, textures, and other characteristics.

“We wanted to make sure that every user would find their voice,” Beaufays says.

Why AI needs Black voices

That also includes acknowledging another complex issue: race. “I’m Black, and ever since I can remember, AI [assistants] have had white voices,” wrote a Reddit user earlier this year.

More recently, this has changed. Both OpenAI and Google’s Gemini do offer voice choices that were trained on voice actors of color; Gemini’s Orbit voice, for instance, is easily identifiable as a Black voice. Turner says that’s smart, noting, “People want to see themselves represented in these technologies. The voice gives some semblance of representation.”

However, using racially diverse voices can also surface existing biases. Earlier this year, OpenAI was forced to discontinue one of its voices over allegations that it sounded too much like Scarlett Johansson. Users who had chosen that voice found it replaced by a Black voice, leading some to allege that the company had opted for a “woke” replacement.

“I understand people lost access to their [voice of] choice, but that doesn’t excuse the racism,” wrote the aforementioned Reddit user. “I’ve seen so many people call [the Black voice] sassy, or ghetto, or calling her the ‘DEI’ replacement.”

Opting for an everyday voice over a celebrity

The first time Google embraced diverse voices for its Assistant was in 2019, albeit with a bit of a different approach. To promote the use of its smart speakers, the company briefly offered users the ability to make a number of celebrities, including John Legend, the default voice of its assistant.

For Gemini, the company did not want to rely on celebrities. “We [tried] to find voices that represent everyday people in all their beauty as everyday people,” says Beaufays. “Voices that you could meet on the subway, I guess.”

Embracing everyday voices seems like a good first step toward dealing with biases in AI. However, Turner cautions that using a Black voice actor alone doesn’t make an AI assistant inclusive, or even reflective of the diversity within that community.

“If tech companies want to authentically represent the linguistic capabilities and attributes of certain populations, then they need to involve them at the table, and in the design and deployment of these products,” she says. “They need to take this on as something that is truly part of their business, versus trying to guess or assume what people want as a superficial choice of the voices that they use.”

<hr class=“wp-block-separator is-style-wide”/> https://www.fastcompany.com/91245270/as-ai-voices-become-more-human-will-stereotypes-follow?partner=rss&amp;utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=rss+fastcompany&amp;utm_content=rss

Created 13d | Dec 12, 2024, 11:10:04 AM


Login to add comment

Other posts in this group

TikTok is full of bogus, potentially dangerous medical advice

TikTok is the new doctor’s office, quickly becoming a go-to platform for medical advice. Unfortunately, much of that advice is pretty sketchy.

A new report by the healthcare software fi

Dec 25, 2024, 12:30:03 AM | Fast company - tech
45 years ago, the Walkman changed how we listen to music

Back in 1979, Sony cofounder Masaru Ibuka was looking for a way to listen to classical music on long-haul flights. In response, his company’s engineers dreamed up the Walkman, ordering 30,000 unit

Dec 24, 2024, 3:10:04 PM | Fast company - tech
The greatest keyboard never sold

Even as the latest phones and wearables tout speech recognition with unprecedented accuracy and spatial computing products flirt with replacing tablets and laptops, physical keyboards remain belov

Dec 24, 2024, 12:50:02 PM | Fast company - tech
The 25 best new apps of 2024

One of the most pleasant surprises about this year’s best new apps have nothing to do with AI.

While AI tools are a frothy area for big tech companies and venture capitalists, ther

Dec 24, 2024, 12:50:02 PM | Fast company - tech
The future belongs to systems of action

The world of enterprise tech is built on sturdy foundations. For decades, systems of record—the databases, customer relationship management (CRM), and enterprise resource planning (ERP) platforms

Dec 23, 2024, 10:50:06 PM | Fast company - tech
Banning Chinese-made drones could hurt some Americans

Russell Hedrick, a North Carolina farmer, flies drones to spray fertilizers on his corn, soybean and wheat fields at a fraction of what it

Dec 23, 2024, 8:40:03 PM | Fast company - tech