Search Results for author: Luxi He

Found 3 papers, 0 papers with code

AI Risk Management Should Incorporate Both Safety and Security

no code implementations29 May 2024 Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

The exposure of security vulnerabilities in safety-aligned language models, e. g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security.

What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety

no code implementations1 Apr 2024 Luxi He, Mengzhou Xia, Peter Henderson

Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking.

Math

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

no code implementations NeurIPS 2023 Hao Wang, Luxi He, Rui Gao, Flavio P. Calmon

We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development.

Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.