Publications

Paul Christiano, Eric Neyman, Mark Xu. Formalizing the presumption of independence. 2022.

Paul Christiano, Ajeya Cotra, Mark Xu. Eliciting latent knowledge: how to tell if your eyes deceive you. 2021.

Jeff Wu, Long Ouyang, Daniel M Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano. Recursively summarizing books with human feedback. 2021.

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano. Learning to summarize from human feedback. NeurIPS 2020.

Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, Geoffrey Irving. Fine-tuning language models from human preferences. 2019.

Paul Christiano, Buck Shlegeris, Dario Amodei. Supervising strong learners by amplifying weak experts. 2018.

Tom B Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow. Unrestricted adversarial examples. 2018.

Geoffrey Irving, Paul Christiano, Dario Amodei. AI safety via debate. 2018.

Zvika Brakerski, Paul Christiano, Urmila Mahadev, Umesh Vazirani, Thomas Vidick: Certifiable randomness from a single quantum device. FOCS 2018.

Paul Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei: Deep reinforcement learning from human preferences. NIPS 2017.

Paul Christiano: Manipulation-resistant online learning. 2017 (my thesis).

Benya Fallenstein, Jessica Taylor, Paul Christiano: Reflective oracles: a foundation for classical game theory. 2017.

Chelsea Finn*, Paul Christiano*, Pieter Abbeel, Sergey Levine: A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models. NIPS 2016 workshop on adversarial training.

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dandelion Mané: Concrete Problems in AI Safety. 2016.

Paul Christiano: Collaborative prediction with expert advice. 2016.

Paul Christiano: Provably manipulation-resistant reputation systems. COLT 2016 (best student paper).

Paul Christiano: Online local learning via semidefinite programming. STOC 2014 (best student paper).

Scott Aaronson, Paul Christiano: Quantum money from hidden subspaces. STOC 2012.

Paul Christiano, Jonathan A. Kelner, Aleksander Madry, Daniel A. Spielman, Shang-Hua Teng: Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. STOC 2011 (best paper).

Paul Christiano, Erik D. Demaine, Shaunak Kishore: Lossless fault-tolerant data structures with additive overhead. WADS 2011.

(Note that papers at STOC, COLT, FOCS and WADS have alphabetical author lists. ML papers tend to put the largest contributor as first author and the manager or PI as last author, though the recent OpenAI papers were presentations of the whole team’s work. * indicates equal contribution.)