AI alignment

I want to see a future where AI systems help humanity thrive. I think this will probably happen, but only because the ML community will probably put in a whole lot of work. If we fail at this work, I think we run a real risk of making humanity’s future much worse.

AI alignment is the problem of building machines which faithfully try to do what we want them to do (or what we ought to want them to do). I write about alignment here.

Since March 2021 I have been running the Alignment Research Center.

From January 2017-January 2021 I worked on the safety team at OpenAI.

I am an advisor to the UK AI Safety Institute, a trustee of Anthropic’s Long-Term Benefit Trust, and on assorted other boards and advisory panels.

Many people have found this interview helpful for understanding my perspective on AI alignment.

Paul Christiano

Paul Christiano

AI alignment