Search Results for author: Andreas Madsen

Found 7 papers, 5 papers with code

Interpretability Needs a New Paradigm

no code implementations8 May 2024 Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar

At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained.

Are self-explanations from Large Language Models faithful?

1 code implementation15 Jan 2024 Andreas Madsen, Sarath Chandar, Siva Reddy

For example, if an LLM says a set of words is important for making a prediction, then it should not be able to make its prediction without these words.

counterfactual Faithfulness Critic +4

Faithfulness Measurable Masked Language Models

1 code implementation11 Oct 2023 Andreas Madsen, Siva Reddy, Sarath Chandar

Additionally, because the model makes faithfulness cheap to measure, we can optimize explanations towards maximal faithfulness; thus, our model becomes indirectly inherently explainable.

Post-hoc Interpretability for Neural NLP: A Survey

no code implementations10 Aug 2021 Andreas Madsen, Siva Reddy, Sarath Chandar

Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use.

Neural Arithmetic Units

3 code implementations ICLR 2020 Andreas Madsen, Alexander Rosenberg Johansen

We present two new neural network components: the Neural Addition Unit (NAU), which can learn exact addition and subtraction; and the Neural Multiplication Unit (NMU) that can multiply subsets of a vector.

Inductive Bias

Measuring Arithmetic Extrapolation Performance

4 code implementations4 Oct 2019 Andreas Madsen, Alexander Rosenberg Johansen

The goal of NALU is to learn perfect extrapolation, which requires learning the exact underlying logic of an unknown arithmetic problem.

Cannot find the paper you are looking for? You can Submit a new open access paper.