Search Results for author: Michael Gerovitch

Found 2 papers, 0 papers with code

An Assessment of Model-On-Model Deception

no code implementations • 10 May 2024 • Julius Heitkoetter, Michael Gerovitch, Laker Newhouse

The trustworthiness of highly capable language models is put at risk when they are able to produce deceptive outputs.

Paper
Add Code

Black-Box Access is Insufficient for Rigorous AI Audits

no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

External audits of AI systems are increasingly recognized as a key mechanism for AI governance.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.