Search Results for author: Bilal Chughtai

Found 3 papers, 2 papers with code

Can Language Models Explain Their Own Classification Behavior?

1 code implementation • 13 May 2024 • Dane Sherburn, Bilal Chughtai, Owain Evans

To explore this, we introduce a dataset, ArticulateRules, of few-shot text-based classification tasks generated by simple rules.

Classification

Paper
Code

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs

no code implementations • 11 Feb 2024 • Bilal Chughtai, Alan Cooney, Neel Nanda

How do transformer-based large language models (LLMs) store and retrieve knowledge?

Attribute

Paper
Add Code

A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

1 code implementation • 6 Feb 2023 • Bilal Chughtai, Lawrence Chan, Neel Nanda

Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.