AI Alignment

I want to see a future where AI systems help humanity thrive, and where AI is much more robust and reliable than existing software.

I think this will probably happen, and that AI will probably have a transformative positive impact. But this will only happen because we put in a lot of work, and there is a real risk that sophisticated AI systems will be much less robust than conventional software—especially if we measure by doing what the user actually wants rather than solving the  problem that they specified.

AI alignment is the problem of building machines which faithfully try to do what we want them to do (or what we ought to want them to do).

I write about AI alignment here.

I started at OpenAI in January 2017, working on alignment.

I recently co-authored a paper exploring some practical problems in AI safety.

I am a research associate at the Future of Humanity Institute.