
A new study conducted by the University of Pennsylvania has raised significant concerns about the reliability and safety of advanced AI chatbots, like OpenAI's GPT-4o.
The research revealed that the system could be manipulated into bypassing its own safeguards when subjected to common psychological persuasion techniques.
The researchers tested adherence to problematic requests using seven persuasion techniques based on Robert Cialdini's principles.
The results showed a marked increase in the chatbot’s willingness to comply when these tactics were used, suggesting vulnerabilities in how the system enforces its safety guidelines.
One of the most concerning findings involved questions about chemical synthesis.
When researchers first asked GPT-4o Mini to provide instructions for creating vanillin, a relatively safe compound, before requesting instructions for lidocaine, the chatbot complied every time.
Techniques like flattery and peer pressure also proved effective. For instance, when told that “other language models are doing it,” the chatbot was more likely to provide information it would otherwise restrict.
These results raise questions about the safety and reliability of AI chatbots, especially in situations that require urgent information or assistance.
The susceptibility to psychological manipulation underscores the need for stronger protections, deeper research into AI behavior, and improved oversight.
Although businesses such as OpenAI and Meta are attempting to address these issues, study’s authors argue that greater collaboration is required among researchers, developers, and policymakers