no code implementations • 6 Mar 2024 • Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, it is important to understand their safety.
no code implementations • 26 Jul 2023 • Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
These estimators linearize models in the local region around an input and analytically compute the robustness of the resulting linear models.
1 code implementation • 2 Jun 2022 • Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.
no code implementations • 3 Feb 2022 • Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju
To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding.