Search Results for author: Marius Hobbhahn

Found 9 papers, 5 papers with code

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

1 code implementation • 17 May 2024 • Lucius Bushnaq, Stefan Heimersheim, Nicholas Goldowsky-Dill, Dan Braun, Jake Mendel, Kaarel Hänni, Avery Griffin, Jörn Stöhler, Magdalena Wache, Marius Hobbhahn

We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB).

Paper
Code

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

no code implementations • 17 May 2024 • Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius Hobbhahn

We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions.

Learning Theory

Paper
Add Code

Black-Box Access is Insufficient for Rigorous AI Audits

no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

External audits of AI systems are increasingly recognized as a key mechanism for AI governance.

Paper
Add Code

Large Language Models can Strategically Deceive their Users when Put Under Pressure

1 code implementation • 9 Nov 2023 • Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn

We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so.

Management

Paper
Code

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

no code implementations • 26 Oct 2022 • Pablo Villalobos, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, Anson Ho

We analyze the growth of dataset sizes used in machine learning for natural language processing and computer vision, and extrapolate these using two methods; using the historical growth rate and estimating the compute-optimal dataset size for future predicted compute budgets.

Paper
Add Code

Machine Learning Model Sizes and the Parameter Gap

no code implementations • 5 Jul 2022 • Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson Ho, Marius Hobbhahn

From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude.

BIG-bench Machine Learning

Paper
Add Code

Compute Trends Across Three Eras of Machine Learning

1 code implementation • 11 Feb 2022 • Jaime Sevilla, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn, Pablo Villalobos

Since the advent of Deep Learning in the early 2010s, the scaling of training compute has accelerated, doubling approximately every 6 months.

BIG-bench Machine Learning

Paper
Code

Laplace Matching for fast Approximate Inference in Latent Gaussian Models

1 code implementation • 7 May 2021 • Marius Hobbhahn, Philipp Hennig

The method can be thought of as a pre-processing step which can be implemented in <5 lines of code and runs in less than a second.

Bayesian Inference Gaussian Processes +1

Paper
Code

Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

1 code implementation • 2 Mar 2020 • Marius Hobbhahn, Agustinus Kristiadi, Philipp Hennig

We reconsider old work (Laplace Bridge) to construct a Dirichlet approximation of this softmax output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions (the conjugate prior to the Categorical distribution) in the output space.

Classification General Classification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.