Search Results for author: Bilal Chughtai

Found 3 papers, 2 papers with code

Can Language Models Explain Their Own Classification Behavior?

1 code implementation13 May 2024 Dane Sherburn, Bilal Chughtai, Owain Evans

To explore this, we introduce a dataset, ArticulateRules, of few-shot text-based classification tasks generated by simple rules.

Classification

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs

no code implementations11 Feb 2024 Bilal Chughtai, Alan Cooney, Neel Nanda

How do transformer-based large language models (LLMs) store and retrieve knowledge?

Attribute

A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

1 code implementation6 Feb 2023 Bilal Chughtai, Lawrence Chan, Neel Nanda

Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.