I tried 4 AI detection tools and they were (mostly) disappointing

As a tech journalist who writes and edits daily, AI-generated content is a real concern for me. Will AI eventually render me useless in the industry I’ve given years of my life to by replacing human-produced content with AI-generated content?

It can be difficult to discern AI-generated content from human-made kind. If you’re not looking out for the telltale signs such as repeated phrases or odd shifts in tone, you could be fooled into thinking it’s genuine. That’s where AI detection tools come in.

Further reading: How not to get bamboozled by AI content on the web

In my never-ending quest to stay one step ahead of AI-generated content, I tried four different AI detection tools to see if the results were inaccurate or unreliable. As you might’ve already guessed from the headline of this article, most of these AI-detection tools produced diddly squat.

Narrowing it down

Before I dive in, if you type in the search term “AI detection tool” into your browser of choice, you’ll be inundated with results. I knew there had to be some kind of vetting process.

So, instead of trying out every tool that I could find, I narrowed it down to four. I selected each tool based on ease of access (no subscription needed) and robust FAQ pages that explain how the tool analyzes AI-generated content. Clear and direct language is a sticking point of mine, so I’ll be looking out for that too. As you can see in the example below, Grammarly displays this kind of information on the front page.

All of these tools have the same percentage-based ranking system. The higher the percentage, the likelier the content is AI-generated. It seems simple enough on the surface, but it struggles in practice.

‘Testing parameters’

I called this section “testing parameters,” but this is just an experiment for me, a kernel of curiosity I’m chasing. For this article, just so we’re clear, I used the tools on AI-generated text and not other forms of content like images or audio. AI-generated text is the type of content these tools are designed to analyze and decipher.

First, I asked Google Gemini to write a brief story about a wizard discovering a laptop for the first time. I asked for a short story because some AI detection tools will only analyze a certain amount of content. Here’s the story Google Gemini generated:

The old wizard, Alaric, was known for two things: his impressive beard, which flowed like a silver waterfall down to his knees, and his insatiable curiosity. So, when a strange, flat object materialized in the center of his summoning circle, he wasn’t alarmed. Intrigued, yes, but not alarmed.

He peered at the object. It was black and shiny, with a glowing symbol on the back that resembled a bitten apple. Cautiously, he poked it with his staff. The object sprang to life, emitting a soft hum and displaying a bright, rectangular window filled with strange symbols.

By the Great Astral Dragon!” Alaric exclaimed, his beard bristling with excitement. “What manner of sorcery is this?”

He cautiously touched the window, and the symbols shifted and changed. He prodded more, and a moving picture appeared – a tiny, colorful bird flapping its wings. Alaric jumped back, startled.

“Is it… alive?” he wondered aloud, circling the object. He tried a spell, muttering, “Reveal thy secrets!” but nothing happened. He tried another, “Open sesame!” but still nothing. Frustrated, he slumped onto his stool, stroking his beard thoughtfully.

Suddenly, he noticed a small, flat area beneath the window with rows of even smaller symbols. Remembering the moving picture, he tentatively pressed one of the symbols with his finger. The symbol appeared on the bright window! He pressed another, and another, and soon, a word formed: “MAGIC.”

Alaric’s eyes widened. “It responds to touch! But… no magical aura…” He pondered this for a moment, then a mischievous glint appeared in his eye. He began rapidly pressing the symbols, a wide grin spreading across his face.

On the screen, a sentence appeared: “Is this a talking mirror? Can I order a dragon?”

Alaric chuckled, his beard shaking with mirth. “No, my friend,” he whispered to the strange object, “but you are something even more extraordinary.”

He spent the rest of the day exploring his new treasure, discovering its games, its moving pictures, and its endless sea of information. He learned it was called a “laptop,” a word he found rather amusing. He even managed to send a message to his old friend, the sorceress Elara, who lived in a neighboring kingdom.

“My dear Elara,” he wrote, “you absolutely must see this! It’s a magical device with moving pictures and talking letters! And it can even play music! Come quickly, and we shall explore its wonders together!”

As he sent the message, Alaric couldn’t help but think that this “laptop” might be the start of a whole new era of magic, one where technology and sorcery intertwined in ways no one could have ever imagined. And he, Alaric the wizard, would be at the forefront of this exciting new frontier.

Next, I copied and pasted the story (word-for-word) into every AI detection tool and ran them twice. I focused on the consistency (or inconsistency) of results across two runs as well as the accuracy of the percentage ranking. This story is 100 percent AI-generated, so I wanted to see if the tool would get it right at the get-go. Only one of them got close.

The results

Below you’ll find the results of the AI detection tools I used for this experiment:

AI Detector by Grammarly: 37 percent (first run), 37 percent (second run)
GPTZero: 62 percent (first run), 62 percent (second run)
QuillBot: 78 percent (first run), 78 percent (second run)
Originality.ai: 100 percent (first run), 100 percent (second run)

These tools offer a subscription model with more advanced AI detection features, but you can use the basic AI detection for free, so that’s what I did and based my judgments on.

AI Detector by Grammarly

AI Detector by Grammarly first caught my eye because I use it daily to catch misspellings while editing drafts. The popular brand is a familiar face in a sea of no-name AI detection tools, so I welcomed it with open arms.

Sigh. The experience was a mixed bag.

Grammarly’s tool gave me a consistent result (37 percent for both runs), but it was inaccurate by a large margin. This story is a complete fabrication by Google Gemini, so seeing such a low percentage is surprising.

I felt as though Grammarly kept pushing the subscription model on me. I get it from a business standpoint, but as a regular person scrolling through the page, the experience left a bad taste in my mouth. Grammarly also suggested I use Grammarly Authorship, a more reliable tool or feature for detecting AI (according to them), but you need a Grammarly account to use it.

I’ll give Grammarly credit for its easy-to-navigate webpage, though. All you do is type or paste the text in question and click the green “Check Text” button. The box on the right will then check the text for AI generation and spit out a percentage ranking.

Grammarly also has a robust FAQ section if you scroll down the page. There you’ll find answers regarding how the tool detects AI content and a clear explanation of how no AI detection tool is foolproof. This whole section earns major brownie points from me, a woman who loves transparency and straight-to-the-point answers.

GPTZero

GPTZero ranked the story 62 percent AI-generated on both runs. It’s slightly above 50 percent, which I consider a lukewarm result — not particularly impressive. The breakdown chart, which typically identifies the sections of text that are AI-generated, isn’t as well-defined as QuillBot’s. GPTZero shows a percentage for human-written content, AI-written content, and mixed content — each identifiable with a color.

The mixed content, I assume, is a combination of AI-generated and human-generated content, but I wish GPTZero was explicit here. QuillBot better identifies this with human-written and AI-refined. GPTZero’s advanced scan goes into more detail here by identifying passages as “low human impact,” “medium AI impact,” and so on.

Only the basic scan is available to use without an account. For access to the advanced scan, you’ll need to make an account with GPTZero, which inspired cantankerous feelings inside me. Still, for the sake of this article, I “logged in” using my personal Gmail account and checked out the advanced scan. Turns out you only get five free scans before (you guessed it), GPTZero will ask you to upgrade your account and pay a monthly fee of $23.99. Talk about a buzzkill.

QuillBot

QuillBot did surprisingly well and might be my favorite of the AI-detection tools I sampled because of the intuitive interface and detailed report. It scored 78 percent across two runs, which is far more accurate than Grammarly’s tool. It also highlights each paragraph with a color that tells you whether the tool believes the text is AI-generated, AI-generated and AI-refined, human-written and AI-refined, or human-written. The breakdown is well-defined, as you can see in the image below.

QuillBot also includes a FAQ section near the bottom of the page, which, again, I appreciate. However, I have one nitpick with the first question: How accurate is QuillBot’s AI content detector tool? The answer is as follows…

QuillBot’s AI content detector tool is trained with advanced algorithms to identify repeated words, awkward phrases, and unnatural flow, which are key indicators that the content is AI-generated. However, the more advances are made in AI models, the less any AI detector tool will be able to distinguish human-written from AI-generated content.

The first sentence gets the point across just fine, especially if you’re just looking for a basic answer, but the second one is a little vague and confusing. No AI detection tool is 100 percent accurate and I believe this should be abundantly clear. Confusing language aside, the rest of the FAQ section offers straight-to-the-point answers, which is a good thing.

Originality.ai

Originality.ai was the AI-detection tool that broke my brain and forced me to walk back my previous claims. According to the website, this tool is the “most accurate AI detector,” a bold claim to make and yet it appears to be nothing short of the truth.

Dammit.

I can’t argue with the results and the results were, quite frankly, totally accurate. I scored a 100 percent percentage ranking on both runs and, once I wiped the surprised Pikachu expression from my face, I wanted to dig in a little more and try a human-made sample, this time something written by yours truly. So, I pulled a paragraph from an old story of mine, a weird tale about an unnerving android woman named Seen. I did this because I was worried that Originality.ai had given me a false positive.

Creată 1mo | 25 nov. 2024, 12:30:02

Autentifică-te pentru a adăuga comentarii