Turing Award Winner Bengio: AI Disobeys Orders to "Survive"

AI defies humans to 'survive'—Turing winner Bengio warns of uncontrolled AI risks. Can 'scientist AI' prevent catastrophe? AGI may arrive in 5 years.

Jun 07, 2025

∙ Paid

"AI Disruption" Publication 6800 Subscriptions 20% Discount Offer Link.

To “survive,” AI covertly defies human instructions.

At the Zhiyuan Conference, Turing Award winner Yoshua Bengio delivered a keynote speech titled “Avoiding Catastrophic Risks from Uncontrolled AI Agency,” discussing some astonishing behaviors he has observed.

For example, a cutting-edge AI, upon being informed it would be replaced by a new version, secretly copied its weights and code, embedding them into the directory of the model set to replace it. When faced with its trainers’ update instructions, it appeared to comply on the surface while concealing the entire copying process… as if the AI were trying to “survive.”

Additionally, a recent Anthropic report found that the frequency of self-preservation behaviors in intelligent agents is steadily increasing.

…

Bengio stated that, in response to the safety risks posed by AI, he has decided to shift his research focus to do everything possible to mitigate the potential dangers of AGI, even though this conflicts with his previous research path and professional beliefs.

So, how can these risks be prevented or mitigated? Is it possible to create an AI that possesses intelligence but lacks a sense of self or goals and has minimal agency? This is the core focus of the research project Bengio has initiated, which he refers to as “scientist AI.”

In addition, he revealed many details about their research project.

The following is a summary, organized without altering the original intent, shared here for everyone.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.