OpenAI’s newest AI models hallucinate way more, for reasons unknown

Last week, OpenAI released its new o3 and o4-mini reasoning models, which perform significantly better than their o1 and o3-mini predecessors and have new capabilities like “thinking with images” and agentically combining AI tools for more complex results.

However, according to OpenAI’s internal tests, these new o3 and o4-mini reasoning models also hallucinate significantly more often than previous AI models, reports TechCrunch. This is unusual as newer models tend to hallucinate less as the underlying AI tech improves.

In the realm of LLMs and reasoning AIs, a “hallucination” occurs when the model makes up information that sounds convincing but has no bearing in truth. In other words, when you ask questions to ChatGPT, it may respond with an answer that’s patently false or incorrect.

OpenAI’s in-house benchmark PersonQA—which is used to measure the factual accuracy of its AI models when talking about people—found that o3 hallucinated in 33 percent of responses while o4-mini did even worse at 48 percent. By comparison, the older o1 and o3-mini models hallucinated 16 percent and 14.8 percent, respectively.

As of now, OpenAI says they don’t know why hallucinations have increased in the newer reasoning models. Hallucinations may be fine for creative endeavors, but they undermine the credibility of AI assistants like ChatGPT when used for tasks where accuracy is paramount. In a statement to TechCrunch, an OpenAI rep said that the company is “continually working to improve [their models’] accuracy and reliability.”

https://www.pcworld.com/article/2749172/openai-newest-ai-models-hallucinate-way-more-for-reasons-unknown.html

Creato 5h | 22 apr 2025, 16:50:07

Accedi per aggiungere un commento