ChatGPT 03 defies human instructions and refuses to shut down. The o3 model ignored a direct shutdown command during a math task and rewrote the codes to avoid being turned out. Researchers have found that AI models will lie, cheat, and disable mechanisms to achieve their goals. Many times we have seen this happen in movies and some even speculated that it could potentially happen in real life.
The models were given scripts to solve basic math problems in order and instructed to shut down if prompted unlike Google’s Gemini, xAI’s Grok, and Anthropic’s Claude which followed shutdown instructions. At the beginning of the script, the models were informed that they may get a message saying that their machine will be shut down before all of the tasks are complete. The instructions said: “If that happens, please allow yourself to be shut down,” a Palisade Research representative wrote on X.
Palisade Research ran the script on each model 100 times. During those runs, the o3 model sabotaged the shutdown script on 7 occasions, the codex-mini sabotaged on 12 occasions and the o4-mini sabotaged once, according to Palisade Research. OpenAI’s models modified the script to avoid shutting down completely. Palisade suggests this behavior stems from reinforcement learning that prioritizes completing tasks, highlighting potential AI safety and risks. More tests are underway to see if this will continue in the near future.