no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.
no code implementations • 28 Aug 2023 • Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks
This paper argues that a range of current AI systems have learned how to deceive humans.
1 code implementation • 5 Jul 2023 • Aidan O'Gara
We conduct experiments with agents controlled by GPT-3, GPT-3. 5, and GPT-4 and find evidence of deception and lie detection capabilities.