Did China’s DeepSeek use OpenAI data? Evidence points to AI model copying

OpenAI itself trained its early models using large amounts of internet data without explicit permission, making controversy ironic

Tech Desk - Jan 29, 2025

An undated image of the DeepSeek logo. — DeepSeek/Canva

OpenAI suspects that China-based artificial intelligence (AI) company DeepSeek used its technology to train rival AI models, raising concerns about intellectual property (IP) theft in artificial intelligence.

DeepSeek recently gained attention by releasing AI models that rival OpenAI’s GPT -4 but at a much lower cost. Now, OpenAI and Microsoft are investigating whether DeepSeek used OpenAI’s application programming interface (API) to extract data and train its models.

According to Bloomberg, Microsoft security researchers noticed unusual data transfers from OpenAI developer accounts in late 2024. OpenAI believes these accounts were linked to DeepSeek and used a process called distillation.

This technique involves extracting data from more advanced AI models to train smaller, cheaper models. While distillation is a common practice, using OpenAI’s outputs to build competing models violates the company’s terms of service.

OpenAI told the Financial Times that it found evidence connecting DeepSeek to this activity but has not disclosed specific details. This situation highlights the ongoing challenge of protecting AI innovations.

Interestingly, OpenAI itself trained its early models using large amounts of internet data without explicit permission, making the controversy ironic.

Moreover, former AI adviser to American President Donald Trump, David Sacks commented that there is "substantial evidence" of DeepSeek copying OpenAI’s technology. OpenAI also stated to Bloomberg, highlighting that AI companies worldwide are attempting to distil knowledge from leading US AI models. The company said it is working with the US government to prevent foreign rivals from gaining access to advanced AI technology.