There’s a cat-and-mouse game between those using generative AI chatbots to produce text undetected and those trying to catch them. Many believe they know the telltale signs—though as a journalist fond of the word “delve” and prone to em-dashes, I’m not so sure.
Researchers at four U.S. universities, however, have taken a more rigorous approach, identifying linguistic fingerprints that reveal which large language model (LLM) produced a given text.
“All these chatbots are coming out every day, and we interact with them, but we don’t really understand the differences between them,” says Mingjie Sun, a researcher at Carnegie Mellon University and lead author of the study, which was published in Cornell University’s preprint server arXiv. “By training a machine learning classifier to do this task, and by looking at the performance of that classifier, we can then assess the difference between different LLMs.”
Sun and his colleagues developed a machine learning model that analyzed the outputs of five popular LLMs, and was able to distinguish between them with 97.1% accuracy. Their machine learning model uncovered distinct verbal quirks unique to each LLM.
ChatGPT’s GPT-4o model, for instance, tends to use “utilize” more than other models. DeepSeek is partial to saying “certainly.” Google’s Gemini often prefaces its conclusions with the word “essentially,” while Anthropic’s Claude overuses phrases like “according to” and “according to the text” when citing its sources.
xAI’s Grok stands out as more discursive and didactic, often reminding users to “remember” key points while guiding them through arguments with “not only” and “but also.”
“The writing, the word choices, the formatting are all different,” says Yida Yin, a researcher at the University of California, Berkeley, and a coauthor of the paper.
These insights can help users select the best model for specific writing tasks—or aid those trying to catch AI-generated text masquerading as human work. So, remember: according to this study, if a model utilizes certain words, it’s certainly possible to identify it.
Accedi per aggiungere un commento
Altri post in questo gruppo

Instagram has begun testing AI-powered technology designed to proactively identify accounts it suspects belong to teens—even if the user has listed an adult birthdate—and place them under special

Behind the curtain of generative AI breakthroughs and GPU hype, a quieter transformation is taking place. Data center architecture and its prowess have become a fierce battleground as AI models ex

Amid the video podcast boom, Netflix is making its own move into the space.

If real Easter eggs aren’t your thing this weekend, you may find hunting for digital ones more enjoyable. And there are some cool ones to find at your fingertips, provided you have an iPhone or Ma

With music streaming, users have gotten used to being at the mercy of algorithms. But French music streamer Deezer is making it easier for its subscribers to make the algorithm work for them.

Trying to get from point A to point B? If only it were that simple! With any manner of travel these days, you’ve got options: planes, trains, buses, ferries, and beyond. And finding the best

When Twitter cofounder and Medium founder Evan “Ev” Williams was planning his 50th birthday party, he didn’t know who to invite. Having spent more of his life building and scaling tech