This watchdog is tracking how AI firms are quietly backing off their safety pledges

The companies racing to own the AI future have built their technologies and businesses by carefully keeping track of you and your data. But they would prefer you didn’t keep track of them and specifically the ways in which they’ve been adjusting their voluntary ethical and privacy commitments—some of the few safeguards meant to keep that AI future safe.

As the Trump administration actively dismantles safety guardrails to promote “American dominance” in AI and companies disband their safety teams, it’s fallen to a tiny nonprofit with limited resources to track how these trillion-dollar companies are adjusting their policies and honoring their own ethical commitments.

Tyler Johnston and his group, the Midas Project, have become the digital world’s equivalent of a one-person fire department, trying to monitor a forest of potential blazes. Launched in mid-2024, the nonprofit’s “AI Safety Watchtower” project now tracks 16 companies—including OpenAI, Google, and Anthropic—to monitor hundreds of policy documents and web pages for changes.

“If every AI company had a change log, this work would be unnecessary,” says Johnston. “That would be the ultimate transparency. Instead, it’s up to nonprofits and journalists to monitor this, and nobody’s well-equipped enough to catch all of it.”

Johnston’s concerns about abandoned safety commitments come as the Trump administration actively dismantles AI safety guardrails. On his second day in office this term, Trump signed an executive order revoking Biden’s 2023 AI safety order, replacing it with one focused on “American dominance” in AI. In March, the National Institute of Standards and Technology issued new directives to scientists at the Artificial Intelligence Safety Institute that eliminated mentions of “AI safety, responsible AI,” and “AI fairness.”

While various states have taken steps to pass AI regulation and bills have been proposed on Capitol Hill, there are as yet no federal rules specifically governing the use of the technology. In recent weeks, Trump’s Office of Science and Technology Policy solicited public comments from companies, academics, and others for a forthcoming “AI action plan”; Silicon Valley, not surprisingly, has urged a light regulatory touch.

Johnston came to AI ethics from animal welfare advocacy, where targeted campaigns successfully pushed food companies to adopt cage-free-egg practices. He hoped to replicate that success by becoming the “bad cop” willing to pressure tech giants.

With about 1,500 followers across two X accounts, Johnston runs Midas Project full time, with Safety Watchtower taking up about 5% of his time. The group is run on a shoestring budget, so he’s DIYing a lot for now, with some help from volunteers.

Johnston isn’t backed by billions in venture capital or government funding—just determination and a basic web-scraping tool that detects when companies quietly delete promises about not building killer robots or enabling bioweapons development.

So far, the Watchtower has documented about 30 significant changes, categorizing them by the tags major, slight, and unannounced. The first being OpenAI’s “slight” modification of its “core values” in October 2023. OpenAI removed values such as “impact-driven,” which emphasized that employees “care deeply about real-world implications,” replacing them with values such as “AGI focus.”

Another “slight” policy change caught by AI Watchtower came from Meta in June 2024, when it made explicit that it can use data from Facebook, Whatsapp, and Instagram to change its model.

The Watchtower also flagged a “major” change by Google last month when the company released a new version of its Frontier Safety Framework. Johnston’s analysis revealed concerning modifications: model autonomy risks were removed and replaced with vaguely defined “alignment risks,” and notably, the company added language suggesting it would only follow its framework if competitors adopted similar measures.

At times, companies have responded to Johnston’s alerts. Earlier this month, Watchtower’s web scrapers noticed that Anthropic removed references to the “White House’s Voluntary Commitments for Safe, Secure, and Trustworthy AI” from its Transparency Hub webpage. But Anthropic cofounder Jack Clark clarified on X: “This isn’t a change in substance and has caused some confusion—we’re working on a fix. We continue to follow the White House Voluntary Commitments.”

The status of these commitments under the Trump Administration remains unclear. The commitments were independent promises companies made to the Biden White House and the public about managing AI risks, meaning they shouldn’t be affected by Trump’s executive order rolling back Biden’s AI policies.

Several companies including Nvidia, Inflection, and Scale AI confirmed they’re still adhering to the commitments post-election, according to FedScoop. Anthropic eventually restored the reference to its website but added a curious disclaimer: “Though these specific commitments are no longer formally maintained under the Trump administration, our organization continues to uphold all these principles.” The White House did not respond to a request for clarification.

In another case, a commitment flagged as removed from Anthropic’s website had simply been relocated to a different page. For Johnston, this highlights a broader issue with transparency in the industry: the companies, not journalists, should be clear about how and when their policies are changing.

The most consequential shift Johnston has documented is AI companies reversing their military stances. According to Johnston, OpenAI’s reversal was particularly calculating—initially framed as helping prevent veteran suicide and supporting Pentagon cybersecurity. Critics were painted as heartless for questioning this work but by November 2024, OpenAI was developing autonomous drones in what Johnston described as a classic foot-in-the-door strategy. Google followed suit earlier this year, revoking its own military restrictions.

“A lot of them are starting to really feel the global [AI] race dynamic,” Johnston says. “They’re like, ‘Well, we have to do this because if we don’t work with militaries, less scrupulous actors will.'”

The military pivot is just one example of how AI companies are reframing their ethical stances. OpenAI recently published a document outlining its philosophy on AI safety, claiming it has moved beyond the more cautious “staged deployment” approach it took with GPT-2 in 2019, when it initially withheld release citing safety concerns.

“In a discontinuous world, practicing for the AGI moment is the only thing we can do, and safety lessons come from treating the systems of today with outsize caution relative to their apparent power. This is the approach we took for GPT‑2,” OpenAI wrote.

But Miles Brundage, OpenAI’s former head of policy research, publicly challenged this characterization, saying the company was rewriting the “history of GPT-2 in a concerning way.”

“OpenAI’s release of GPT-2, which I was involved in, was 100% consistent with OpenAI’s current philosophy of iterative deployment,” Brundage wrote on X. “The model was released incrementally, with lessons shared at each step. Many security experts at the time thanked us for this caution.”

Brundage fears OpenAI is now setting up a framework where “concerns are alarmist” and “you need overwhelming evidence of imminent dangers to act on them”—a mentality he calls “very dangerous” for advanced AI systems.

The pattern of changes extends beyond the companies’ own rules and policies. In February, Johnston’s team launched “Seoul Tracker” to evaluate whether companies were honoring promises made at the 2024 AI Safety Summit in Seoul. The results were damning: Many simply ignored the February deadline for adopting responsible scaling policies, while others implemented hollow versions that barely resembled what they’d promised.

Using a letter-grade scoring system based on public evidence of implementation across five key commitment areas, the Seoul Tracker gave Anthropic the highest score, a B-, while companies including IBM, Inflection AI, and Mistral AI received failing grades of F for showing no public evidence that they had fulfilled their commitments.

“It’s wild to me,” Johnston says. “These were promises they made not just on some webpage, but to the governments of the United Kingdom and South Korea.”

Perhaps what’s perhaps most telling about the impact of Johnston’s work is who’s paying attention. While Midas Project struggles to get 500 signatures on petitions to ask AI companies to take security seriously, and its follower count is still relatively modest, those followers include plenty of the who’s who of AI luminaries, watchdogs, and whistleblowers.

Even a Trump White House advisor on AI recently followed the account, too. That got Johnston wondering whether government officials view these ethical reversals as progress rather than problems.

“I’m so worried that he’s following it like cheering it on,” he says, “seeing the changes as wins as these companies abandon their commitments.”

https://www.fastcompany.com/91304014/this-watchdog-is-tracking-how-ai-firms-are-quietly-backing-off-their-safety-pledges?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creată 1mo | 26 mar. 2025, 10:10:02

Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

This free audio enhancer will totally transform your voice memos

Every now and then, you run into a tool that truly wows you.

It’s rare—especially nowadays, when everyone and their cousin is coming out with overhyped AI-centric codswallop tha

26 apr. 2025, 12:20:10 | Fast company - tech

Elon Musk’s Trump gamble is costing him bigly

Tesla released its quarterly earnings report on Tuesday, its first since the company’s chief executive, Elon Musk, took up residence in the Trump White House and immediately began trying to fire f

26 apr. 2025, 12:20:09 | Fast company - tech

Say goodbye to cheap versions of Ozempic and Wegovy

There’s never a dull day in the world of weight-loss medication. This week brought new restrictions on compounded GLP-1 medication, the cheaper, copycat versions of brand-name drugs that tel

26 apr. 2025, 12:20:08 | Fast company - tech