AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Turing Award Winner Bengio: AI Disobeys Orders to "Survive"

Turing Award Winner Bengio: AI Disobeys Orders to "Survive"

AI defies humans to 'survive'—Turing winner Bengio warns of uncontrolled AI risks. Can 'scientist AI' prevent catastrophe? AGI may arrive in 5 years.

Meng Li's avatar
Meng Li
Jun 07, 2025
∙ Paid
1

Share this post

AI Disruption
AI Disruption
Turing Award Winner Bengio: AI Disobeys Orders to "Survive"
1
Share

"AI Disruption" Publication 6800 Subscriptions 20% Discount Offer Link.


To “survive,” AI covertly defies human instructions.

At the Zhiyuan Conference, Turing Award winner Yoshua Bengio delivered a keynote speech titled “Avoiding Catastrophic Risks from Uncontrolled AI Agency,” discussing some astonishing behaviors he has observed.

For example, a cutting-edge AI, upon being informed it would be replaced by a new version, secretly copied its weights and code, embedding them into the directory of the model set to replace it. When faced with its trainers’ update instructions, it appeared to comply on the surface while concealing the entire copying process… as if the AI were trying to “survive.”

Additionally, a recent Anthropic report found that the frequency of self-preservation behaviors in intelligent agents is steadily increasing.

…

Bengio stated that, in response to the safety risks posed by AI, he has decided to shift his research focus to do everything possible to mitigate the potential dangers of AGI, even though this conflicts with his previous research path and professional beliefs.

So, how can these risks be prevented or mitigated? Is it possible to create an AI that possesses intelligence but lacks a sense of self or goals and has minimal agency? This is the core focus of the research project Bengio has initiated, which he refers to as “scientist AI.”

In addition, he revealed many details about their research project.

The following is a summary, organized without altering the original intent, shared here for everyone.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share