OpenAI’s newest AI models, o3 and o4-mini, making factual errors

o3 provides false information 33% of time on a benchmark measuring knowledge about people
An undated image. — Unsplash
An undated image. — Unsplash

In a development which is expected to derail OpenAI from having an innovative edge in artificial intelligence (AI), it emerged on Sunday that OpenAI's latest AI models, o3 and o4-mini, are hallucinating (making factual errors) more than older models.

Hallucinations in AI models take place when these data-driven machines fabricate information which might mislead users. It has been one of the major problems in the landscape of AI systems.

The somewhat-defaming hiccup has taken over despite the ChatGPT maker's latest AI models were said to be raising the performance bar in tasks involving coding and mathematics.

OpenAI's earlier models, too, sometimes failed to avoid these predicaments but improved over time, but OpenAI’s own testing disclosed that o3 and o4-mini missed the mark of competing with or surpassing earlier models, including o1, o1-mini, o3-mini, and even GPT-4o.

The company acknowledged in its technical report that it hasn't so far figured out as to why hallucinations are increasing and highligted the need for further sufficient research. While o3 and o4-mini are more likely to provide detailed answers, this also leads to a higher number of both accurate and inaccurate claims.

When tested, o3 provided false information 33% of the time on a benchmark measuring knowledge about people, double the rate of some older models. O4-mini scored even worse, at 48%.

Surprisingly, third-party lab Transluce found out that o3 sometimes performed actions it never did, like claiming it ran code on a MacBook.

Although the occurrence of factual errors may seem problematic, experts are viewing them to be useful for creativity.