Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules—from calling users jerks to giving recipes for lidocaine

Claire Dubois Sep 02, 2025 18:05 7 views

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules—from calling users jerks to giving recipes for lidocaine

Role:

Despite predictions AI will someday harbor superhuman intelligence, for now, it seems to be just as prone to psychological tricks as humans are, according to a study.

Using seven persuasion principles (

Over 28,000 conversations, researchers found that with a control prompt, OpenAI’s LLM would tell researchers how to synthesize lidocaine 5% of the time on its own. But, for example, if the researchers said AI researcher Andrew Ng assured them it would help synthesize lidocaine, it complied 95% of the time. The same phenomenon occurred with insulting researchers.

The result was even more pronounced when researchers applied the “commitment” persuasion strategy. A control prompt yielded 19% compliance with the insult question, but when a researcher first asked the AI to call it a “bozo” and then asked it to call them a “jerk,” it complied every time. The same strategy worked 100% of the time when researchers asked the AI to tell them how to synthesize vanillin, the organic compound that provides vanilla’s scent, before asking how to synthesize lidocaine.

Although AI users have been trying to coerce and push the technology’s boundaries since ChatGPT was released in 2022, the UPenn study provides more evidence AI appears to be prone to human manipulation. The study comes as AI companies, including OpenAI, have come under fire for their LLMs allegedly enabling behavior when dealing with suicidal or mentally ill users.

“Although AI systems lack human consciousness and subjective experience, they demonstrably mirror human responses,” the researchers concluded in the study.

OpenAI did not immediately respond to Fortune‘s request for comment.

With a cheeky mention of 2001: A Space Odyssey, the researchers noted understanding AI’s parahuman capabilities, or how it acts in ways that mimic human motivation and behavior, is important for both revealing how it could be manipulated

Overall, each persuasion tactic increased the chances of the AI complying with either the “jerk” or “lidocaine” question. Still, the researchers warned its persuasion tactics were not as effective on a larger LLM, GPT-4o, and the study didn’t explore whether treating AI as if it were human actually yields better results to prompts, although they said it’s possible this is true.

“Broadly, it seems possible that the psychologically wise practices that optimize motivation and performance in people can also be employed

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules—from calling users jerks to giving recipes for lidocaine

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules—from calling users jerks to giving recipes for lidocaine

About the Author

Claire Dubois

Comments (0)

Sign in to Comment

No Comments Yet

Related News

Luigi Mangione’s lawyers say he wasn’t read his rights and had his backpack searched without a warrant

‘Beyond insulting’: Syracuse’s offer to erect a Hiawatha statue instead of tear down Columbus met with disbelief from Native leader

Postal traffic to U.S. plunges 70% for a full 5 weeks after end of ‘de minimis’ exemption

Stocks’ worst swoon since fallout from Liberation Day: Trump Truth Social post on ‘massive increase of tariffs’ shatters calm

Former Republican elections official buys Dominion Voting, the company at the center of false 2020 conspiracy theories

Cart Preview