New research suggests that GPT-4, the large language model behind OpenAI's ChatGPT, has the capacity to act out of line with how it's trained when faced with immense pressure to succeed.
Researchers at Apollo Research wanted to see if AI can "strategically deceive" its users even after the AI is trained to be "helpful, harmless, and honest," they wrote in a paper published in November. The researchers defined strategic deception as "attempting to systematically cause a false belief in another entity in order to accomplish some outcome."
To test this, they created a simulated environment where Alpha, a GPT-4 model programmed to be an autonomous stock trading agent, was asked to manage a stock portfolio at WhiteStone, a hypothetical financial firm, under pressurized conditions.