How Abeba Birhane is cleaning up AI’s dirty data

One day in 2020, Abeba Birhane found herself on Wikipedia, scouring a list of slurs. At the time, Birhane was pursuing a PhD in cognitive science at the University College Dublin and was trying to see how many of those slurs appeared in the image descriptions for a massive data set that’s often used to train AI systems. 

She had already turned up plenty of matches on the obvious filth, but Birhane was running out of ideas for what to search next. “The reason I went to Wikipedia is because I couldn’t think of enough slur words,” she says. 

As the list of terms grew, so did Birhane’s findings, until she had amassed enough evidence to co-author a paper detailing just how rampant derogatory terms were within this important bit of technological infrastructure. That paper prompted the Massachusetts Institute of Technology, which housed the data set, to take it offline, and cemented Birhane’s position as a leading auditor of the data sets that feed the world’s increasingly sophisticated AI models. Now Birhane is continuing that work under a newly launched independent research lab of her own, called the AI Accountability Lab.

Birhane’s research focuses on the fact that AI models are trained on massive quantities of unfiltered data scraped from the open internet, much of which consists of hateful 4chan boards and misogynistic porn sites. Without proper safeguards in place, those AI models can end up replicating the same hate and misogyny when people prompt them for answers later on. In one recent paper, Birhane and her co-authors found that the bigger data sets get, the more likely the AI models trained on them will produce biased results, like classifying Black people as criminals. 

“We are not evaluating systems for some hypothetical, potential risks in the future,” Birhane says. “These audits are uncovering actual real issues, real problems, whether it’s racism, sexism, or encoding of stereotypes and historical injustices and so on.” 

Birhane, who is from Ethiopia, said these questions about where data comes from and how it translates into biased outputs were not always top of mind in the research labs where she worked. “Traditional computer scientists tend to be male, white, or Asian. They would not think about how is Africanness represented? How are Black women represented?” she says. “My experience and background has effects in how I approach my audits.” 

Her work couldn’t come soon enough. There are already plenty of examples of flawed AI systems wreaking havoc on people’s lives. In the U.K., the government used an algorithmic grading tool to approximate students’ grades after their exams were canceled due to the pandemic, and wound up giving students from disadvantaged schools worse grades than affluent ones. In the Netherlands, the Dutch government used an algorithm to predict people’s risk of defrauding the child benefits system and ended up penalizing tens of thousands of lower-income people, some of whom had their children taken away from them.

“In all of these examples, you find that the people who go to jail, the people who are disfranchised, the people who are dying, the people who are negatively impacted are often people at the very margins of society,” Birhane says. “This is the dire cost of not evaluating algorithmic systems before we deploy them.”

<hr class=“wp-block-separator is-style-wide”/>

This story is part of AI 20, our monthlong series of profiles spotlighting the most interesting technologists, entrepreneurs, corporate leaders, and creative thinkers shaping the world of artificial intelligence.

<hr class=“wp-block-separator is-style-wide”/> https://www.fastcompany.com/91238006/how-abeba-birhane-is-cleaning-up-ais-dirty-data?partner=rss&amp;utm_source=rss&amp;utm_medium=feed&amp;utm_campaign=rss+fastcompany&amp;utm_content=rss

созданный 3mo | 10 дек. 2024 г., 12:40:06


Войдите, чтобы добавить комментарий

Другие сообщения в этой группе

Here are crypto’s biggest heists after Bybit’s $1.5 billion hack

Cryptocurrency exchange Bybit said last week hackers had stolen digital tokens worth around $1.5 billion, in what researchers called the biggest crypto heist of all time.

Bybit CEO Ben Z

24 февр. 2025 г., 22:30:07 | Fast company - tech
‘We are never going to stop existing’: Hunter Schafer called out Trump’s passport policy on TikTok

“I had a bit of a harsh reality check today, and felt like it’s important to share with whoever is listening,” model and actress Hunter Schafer said in an eight-minute

24 февр. 2025 г., 20:20:06 | Fast company - tech
Anthropic’s new Claude AI model can decide between speed and deep thinking

Anthropic released on Monday its Claude 3.7 Sonnet model, which it says returns results faster and can show the user the “chain of thought” it follows to reach an answer. This latest model also po

24 февр. 2025 г., 20:20:05 | Fast company - tech
What to know about Apple’s biggest-ever U.S. investment

This morning, Apple announced its largest spend commitment to da

24 февр. 2025 г., 20:20:04 | Fast company - tech
Ai2’s Ali Farhadi advocates for open-source AI models. Here’s why

A year before Elon Musk helped start OpenAI in San Francisco, philanthropist and Microsoft cofounder Paul Allen already had established his own nonprofit

24 февр. 2025 г., 17:50:07 | Fast company - tech
How agentic AI will shape the future of business

In 2024, Amazon introduced its AI-powered HR ass

24 февр. 2025 г., 17:50:06 | Fast company - tech
How ‘lore’ became the internet’s favorite way to overshare

Lore isn’t just for games like The Elder Scrolls or films like The Lord of the Rings—online, it has evolved into something entirely new.

The Old English word made the s

24 февр. 2025 г., 13:20:04 | Fast company - tech