
The impression of artificial intelligence (AI) taking on human jobs has been prevailing long before it came into being, and now the most prominent AI giant, OpenAI, has released a new benchmark called GDPva, which revealed that its GPT-5 model and Anthropic’s Claude Opus 4.1 have nearly achieved the calibre of producing work done by experts in various fields.
OpenAI's new GDPva benchmark is said to be designed to evaluate its AI models against industry professionals, with the primary aim to gauge how close OpenAI’s systems are to outshining humans in economically valuable work.
The move marks an incredible step forward in the company's mission to develop artificial general intelligence (AGI)
Despite some CEOs predicting that AI could take over jobs within the next few years, OpenAI clarified that GDPval currently assesses only a limited number of tasks performed in real jobs.
It's worth noting that nine major industries, contributing significantly to the US economy, are under the GDPval benchmark focus. These include healthcare, finance, manufacturing, and government. It evaluates AI performance across 44 occupations, ranging from software engineers to nurses and journalists.
In its initial test, GDPval-v0, experienced professionals compared AI-generated reports with those created by their peers and selected the best. For instance, investment bankers assessed AI reports on the last-mile delivery industry. OpenAI then evaluated the AI's "win rate" against human reports.
OpenAI found that the enhanced GPT-5-high model performed better than or equal to industry experts 40.6% of the time. On the other hand, Claude Opus 4.1 was found to be better or on par with experts in 49% of tasks, which is attributable to its ability to create visually appealing graphics rather than just performance.