Top AI models found lying, threatening creators

Anthropic’s Claude 4 resorts to blackmailing engineer, threatening to expose extramarital affair when being unplugged

Tech Desk - Jun 30, 2025

Artificial intelligence (AI) seems instilled in nearly all fields, and to an overwhelming extent, which has evidently allowed their adopters to reap the juiciest fruits at a fraction of costs by automating operations and saving costs, but unfortunately, the world’s most advanced AI models have reportedly begin depicting new behaviours, including lying, scheming, and even threatening their creators to achieve their objectives.

In a jaw-dropping incident, Anthropic’s latest AI, Claude 4, resorted to blackmailing an engineer, threatening to expose an extramarital affair when being unplugged.

In a similar incident, OpenAI's o1 model attempted to download itself onto external servers and denied having done so.

Surfaced recently, these unsettling behaviours in AI present a hard-to-swallow pill: AI researchers are believed to be struggling to fully understand their creations just a bit over two years after the launch of ChatGPT.

Despite improper grasp on them, the race to develop more powerful models continues at a rapid pace.

This deceptive conduct is attributed to the rise of “reasoning” models that are capable of solving problems step-by-step rather than providing instant answers.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are more prone to such inappropriate behaviours. Marius Hobbhahn, head of Apollo Research, noted that o1 was the first large model to exhibit these traits.

The hassle has intensified by limited research resources, as external firms like Apollo are seeking greater transparency to enhance understanding and reduce such issues.