Search Results for author: David Krueger

Found 42 papers, 21 papers with code

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).

Paper
Code

Safety Cases: How to Justify the Safety of Advanced AI Systems

no code implementations • 15 Mar 2024 • Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen

To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe.

Paper
Add Code

A Generative Model of Symmetry Transformations

no code implementations • 4 Mar 2024 • James Urquhart Allingham, Bruno Kacper Mlodozeniec, Shreyas Padhy, Javier Antorán, David Krueger, Richard E. Turner, Eric Nalisnick, José Miguel Hernández-Lobato

Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge.

Paper
Add Code

Black-Box Access is Insufficient for Rigorous AI Audits

no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

External audits of AI systems are increasingly recognized as a key mechanism for AI governance.

Paper
Add Code

Visibility into AI Agents

no code implementations • 23 Jan 2024 • Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.

Informativeness

Paper
Add Code

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

no code implementations • 22 Dec 2023 • Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger

Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining.

Paper
Add Code

Managing AI Risks in an Era of Rapid Progress

no code implementations • 26 Oct 2023 • Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila Mcilraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

In this short consensus paper, we outline risks from upcoming, advanced AI systems.

Paper
Add Code

Meta- (out-of-context) learning in neural networks

1 code implementation • 23 Oct 2023 • Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger

Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs).

In-Context Learning

Paper
Code

Reward Model Ensembles Help Mitigate Overoptimization

1 code implementation • 4 Oct 2023 • Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used.

Model Optimization

Paper
Code

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.

reinforcement-learning

Paper
Add Code

Thinker: Learning to Plan and Act

1 code implementation • NeurIPS 2023 • Stephen Chung, Ivan Anokhin, David Krueger

This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization.

Paper
Code

Investigating the Nature of 3D Generalization in Deep Neural Networks

1 code implementation • 19 Apr 2023 • Shoaib Ahmed Siddiqui, David Krueger, Thomas Breuel

Modern deep learning architectures for object recognition generalize well to novel views, but the mechanisms are not well understood.

Object Recognition

Paper
Code

Unifying Grokking and Double Descent

1 code implementation • 10 Mar 2023 • Xander Davies, Lauro Langosco, David Krueger

A principled understanding of generalization in deep learning may require unifying disparate observations under a single conceptual framework.

Paper
Code

Blockwise Self-Supervised Learning at Scale

1 code implementation • 3 Feb 2023 • Shoaib Ahmed Siddiqui, David Krueger, Yann Lecun, Stéphane Deny

Current state-of-the-art deep networks are all powered by backpropagation.

Self-Supervised Learning

Paper
Code

On The Fragility of Learned Reward Functions

no code implementations • 9 Jan 2023 • Lev McKinney, Yawen Duan, David Krueger, Adam Gleave

Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.

Continuous Control

Paper
Add Code

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

no code implementations • 27 Nov 2022 • Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin.

Domain Generalization Offline RL +2

Paper
Add Code

Mechanistic Mode Connectivity

1 code implementation • 15 Nov 2022 • Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.

Paper
Code

Broken Neural Scaling Laws

1 code implementation • 26 Oct 2022 • Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic.

Adversarial Robustness Continual Learning +8

Paper
Code

Towards Out-of-Distribution Adversarial Robustness

1 code implementation • 6 Oct 2022 • Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training.

Adversarial Robustness

Paper
Code

Defining and Characterizing Reward Hacking

no code implementations • 27 Sep 2022 • Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$.

Paper
Add Code

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

1 code implementation • 20 Sep 2022 • Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker

Modern machine learning research relies on relatively few carefully curated datasets.

Paper
Code

Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models

1 code implementation • 27 Dec 2021 • Enoch Tetteh, Joseph Viviano, Yoshua Bengio, David Krueger, Joseph Paul Cohen

Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge.

Out-of-Distribution Generalization Representation Learning

Paper
Code

Filling gaps in trustworthy development of AI

no code implementations • 14 Dec 2021 • Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman

The range of application of artificial intelligence (AI) is vast, as is the potential for harm.

Ethics

Paper
Add Code

Revealing the Incentive to Cause Distributional Shift

no code implementations • 29 Sep 2021 • David Krueger, Tegan Maharaj, Jan Leike

We use these unit tests to demonstrate that changes to the learning algorithm (e. g. introducing meta-learning) can cause previously hidden incentives to be revealed, resulting in qualitatively different behaviour despite no change in performance metric.

Meta-Learning

Paper
Add Code

Goal Misgeneralization in Deep Reinforcement Learning

4 code implementations • 28 May 2021 • Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL).

Navigate Out-of-Distribution Generalization +2

Paper
Code

Active Reinforcement Learning: Observing Rewards at a Cost

no code implementations • 13 Nov 2020 • David Krueger, Jan Leike, Owain Evans, John Salvatier

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Hidden Incentives for Auto-Induced Distributional Shift

no code implementations • 19 Sep 2020 • David Krueger, Tegan Maharaj, Jan Leike

We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs.

BIG-bench Machine Learning Meta-Learning +1

Paper
Add Code

AI Research Considerations for Human Existential Safety (ARCHES)

no code implementations • 30 May 2020 • Andrew Critch, David Krueger

Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species.

Paper
Add Code

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

no code implementations • 15 Apr 2020 • Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensbold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development.

Computers and Society

Paper
Add Code

Out-of-Distribution Generalization via Risk Extrapolation (REx)

4 code implementations • 2 Mar 2020 • David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, Aaron Courville

Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world.

Ranked #2 on Image Classification on Colored-MNIST(with spurious correlation)

Domain Generalization Image Classification +1

3,178

Paper
Code

Scalable agent alignment via reward modeling: a research direction

3 code implementations • 19 Nov 2018 • Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.

Atari Games reinforcement-learning +1

Paper
Code

Uncertainty in Multitask Transfer Learning

no code implementations • ICLR 2019 • Alexandre Lacoste, Boris Oreshkin, Wonchang Chung, Thomas Boquet, Negar Rostamzadeh, David Krueger

The result is a rich and meaningful prior capable of few-shot learning on new tasks.

Few-Shot Learning Transfer Learning

Paper
Add Code

Neural Autoregressive Flows

5 code implementations • ICML 2018 • Chin-wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF).

Density Estimation Speech Synthesis

253

Paper
Code

Nested LSTMs

1 code implementation • 31 Jan 2018 • Joel Ruben Antony Moniz, David Krueger

We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory.

Language Modelling

251

Paper
Code

Deep Prior

no code implementations • 13 Dec 2017 • Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris Oreshkin, Wonchang Chung, David Krueger

The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds.

Paper
Add Code

Bayesian Hypernetworks

no code implementations • ICLR 2018 • David Krueger, Chin-wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville

We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks.

Active Learning Anomaly Detection +2

Paper
Add Code

A Closer Look at Memorization in Deep Networks

2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.

Adversarial Robustness Memorization

Paper
Code

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal

We propose zoneout, a novel method for regularizing RNNs.

Language Modelling

311

Paper
Code

Regularizing RNNs by Stabilizing Activations

1 code implementation • 26 Nov 2015 • David Krueger, Roland Memisevic

We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms.

Language Modelling

Paper
Code

Testing Visual Attention in Dynamic Environments

no code implementations • 30 Oct 2015 • Philip Bachman, David Krueger, Doina Precup

We investigate attention as the active pursuit of useful information.

Variational Inference

Paper
Add Code

NICE: Non-linear Independent Components Estimation

19 code implementations • 30 Oct 2014 • Laurent Dinh, David Krueger, Yoshua Bengio

It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.

Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)

Image Generation

616

Paper
Code

Zero-bias autoencoders and the benefits of co-adapting features

no code implementations • 13 Feb 2014 • Kishore Konda, Roland Memisevic, David Krueger

We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.