Search Results for author: Aaron J. Li

Found 2 papers, 1 papers with code

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

1 code implementation • 29 Apr 2024 • Aaron J. Li, Satyapriya Krishna, Himabindu Lakkaraju

The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power.

Ethics Language Modelling

Paper
Code

Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining

no code implementations • 8 Jul 2023 • Aaron J. Li, Robin Netzorg, Zhihan Cheng, Zhuoqin Zhang, Bin Yu

In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data.

Image Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.