DeepSeek R1 vs OpenAI o1: Performance evaluation on real-world tasks

In performance test of DeepSeek R1 vs OpenAI o1, R1's reasoning points to a shortcoming in Perplexity’s retrieval engine

Tech Desk - Feb 02, 2025

Deepseek and OpenAI logos are seen in this illustration taken January 27, 2025. — Reuters

The inception of China-based DeepSeek AI assistance appears to have triggered a nonstop fuss, but so have reports of its inaccuracies on a number of benchmarks emerging on the internet.

While DeepSeek R1, the reasoning model behind the DeepSeek chatbot, is said to be a strong competitor against OpenAI o1, VentureBeat ran both models through an assessment based on real-life tasks in a bid to unearth their true potential.

Using Perplexity Pro Search, which now supports o1 and R1, to ensure an even-handed evaluation, the publication tested the AI assistants on simple tasks that require tremendous human effort.

Using DeepSeek R1 and OpenAI o1 to calculate returns on investments

The test involved a hypothetical case where the user invests $140 in seven tech giants on the first day of every month from January to December 2024. Both the bots were then asked to calculate the value of the portfolio at the current date.

To efficiently bring about the task, both reasoning models were supposed to fetch price information of the listed companies on the first day of each month, divide monthly investments across the stocks ($20 per stock), sum them up and compute the whole value based on the stock prices on the current date.

Surprisingly, both models failed to attain the task. OpenAI's o1 responded with a list containing stock prices for January 2024 and January 2025 alongside a formula to calculate the portfolio value. The model could not even measure the correct stock values, stating there would be no return on investment (ROI).

On the other hand, DeepSeek's R1 fell short in a unique manner. It resorted to only taking into account the investments of January 2024 and calculating the ROI for January 2025.

It is worth mentioning that while o1 did not delve deeper into details on how it came up with the results, R1’s reasoning traces pointed to a shortcoming in Perplexity’s retrieva l engine, stating that it failed to bring correct data because the engine failed to obtain the monthly data for stock prices.