Search Results for author: Aidan O'Gara

Found 3 papers, 1 papers with code

AI Alignment: A Comprehensive Survey

no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Paper
Add Code

AI Deception: A Survey of Examples, Risks, and Potential Solutions

no code implementations • 28 Aug 2023 • Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks

This paper argues that a range of current AI systems have learned how to deceive humans.

Paper
Add Code

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

1 code implementation • 5 Jul 2023 • Aidan O'Gara

We conduct experiments with agents controlled by GPT-3, GPT-3. 5, and GPT-4 and find evidence of deception and lie detection capabilities.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.