Stuart Russell on Safe AI

A post having nothing to do with the election!

Jun 13, 2024

Stuart Russell, Prof. of CS at UC Berkeley, is co-author, with Peter Norvig, of Artificial Intelligence, A Modern Approach—familiarly known as AIMA. Since it’s first edition three decades ago, AIMA has been the pre-eminent AI textbook.

Two months ago, Russell gave a talk, accessible to a general audience, AI: What If We Succeed?, for the Neubauer Collegium at the University of Chicago. This is the first talk I’ve seen that presents an intelligent and operationally feasible approach to AI Safety. I recommend it strongly.

The basic idea.

Standard model of AI: Machines are intelligent to the extent to which their actions can be expected to achieve their objectives. E.g., winning at chess.

Russell’s proposed model of beneficial AI: Machines are beneficial to the extent to which their actions can be expected to achieve our objectives. This can be formulated in game theory as an “Assistance game.”

Goal: design machines that:

Must act in the best interests of humans.
Are explicitly uncertain what those interests are.

Uncertainty about human interests leads to deference, minimally invasive behavior, and a willingness to be switched off.

Fortuitously, I just watched Grant Sanderson’s 2024 commencement speech at Harvey Mudd. (Grant Sanderson created and runs the 3Blue1Brown YouTube website.) Sanderson’s career advice: focus on adding value to other people’s lives. This is not a recommendation to be selfless but a way to maximize the chances that what you do will be valued by the world.

I’m writing this here because Sanderson’s advice is so close to Stuart Russell’s prescription for building benefidial AI. Build machines that add value to other people’s lives.

Russ’s Substack

Discussion about this post