no code implementations • 10 May 2024 • Julius Heitkoetter, Michael Gerovitch, Laker Newhouse
The trustworthiness of highly capable language models is put at risk when they are able to produce deceptive outputs.
no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell
External audits of AI systems are increasingly recognized as a key mechanism for AI governance.