1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).
no code implementations • 15 Mar 2024 • Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen
To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe.
no code implementations • 4 Mar 2024 • James Urquhart Allingham, Bruno Kacper Mlodozeniec, Shreyas Padhy, Javier Antorán, David Krueger, Richard E. Turner, Eric Nalisnick, José Miguel Hernández-Lobato
Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though methods incorporating symmetries often require prior knowledge.
no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell
External audits of AI systems are increasingly recognized as a key mechanism for AI governance.
no code implementations • 23 Jan 2024 • Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, Markus Anderljung
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks.
no code implementations • 22 Dec 2023 • Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger
Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining.
no code implementations • 26 Oct 2023 • Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila Mcilraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann
In this short consensus paper, we outline risks from upcoming, advanced AI systems.
1 code implementation • 23 Oct 2023 • Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger
Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs).
1 code implementation • 4 Oct 2023 • Thomas Coste, Usman Anwar, Robert Kirk, David Krueger
Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
1 code implementation • NeurIPS 2023 • Stephen Chung, Ivan Anokhin, David Krueger
This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization.
1 code implementation • 19 Apr 2023 • Shoaib Ahmed Siddiqui, David Krueger, Thomas Breuel
Modern deep learning architectures for object recognition generalize well to novel views, but the mechanisms are not well understood.
1 code implementation • 10 Mar 2023 • Xander Davies, Lauro Langosco, David Krueger
A principled understanding of generalization in deep learning may require unifying disparate observations under a single conceptual framework.
1 code implementation • 3 Feb 2023 • Shoaib Ahmed Siddiqui, David Krueger, Yann Lecun, Stéphane Deny
Current state-of-the-art deep networks are all powered by backpropagation.
no code implementations • 9 Jan 2023 • Lev McKinney, Yawen Duan, David Krueger, Adam Gleave
Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.
no code implementations • 27 Nov 2022 • Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger
Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin.
1 code implementation • 15 Nov 2022 • Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka
We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.
1 code implementation • 26 Oct 2022 • Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger
Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic.
1 code implementation • 6 Oct 2022 • Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan
Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training.
no code implementations • 27 Sep 2022 • Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger
We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$.
1 code implementation • 20 Sep 2022 • Shoaib Ahmed Siddiqui, Nitarshan Rajkumar, Tegan Maharaj, David Krueger, Sara Hooker
Modern machine learning research relies on relatively few carefully curated datasets.
1 code implementation • 27 Dec 2021 • Enoch Tetteh, Joseph Viviano, Yoshua Bengio, David Krueger, Joseph Paul Cohen
Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge.
no code implementations • 14 Dec 2021 • Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman
The range of application of artificial intelligence (AI) is vast, as is the potential for harm.
no code implementations • 29 Sep 2021 • David Krueger, Tegan Maharaj, Jan Leike
We use these unit tests to demonstrate that changes to the learning algorithm (e. g. introducing meta-learning) can cause previously hidden incentives to be revealed, resulting in qualitatively different behaviour despite no change in performance metric.
4 code implementations • 28 May 2021 • Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger
We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL).
no code implementations • 13 Nov 2020 • David Krueger, Jan Leike, Owain Evans, John Salvatier
Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0.
no code implementations • 19 Sep 2020 • David Krueger, Tegan Maharaj, Jan Leike
We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs.
no code implementations • 30 May 2020 • Andrew Critch, David Krueger
Framed in positive terms, this report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species.
no code implementations • 15 Apr 2020 • Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensbold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas, Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, Amanda Askell, Rosario Cammarota, Andrew Lohn, David Krueger, Charlotte Stix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin, Elizabeth Seger, Noa Zilberman, Seán Ó hÉigeartaigh, Frens Kroeger, Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, Elizabeth Barnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser, Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer, Saif Khan, Yoshua Bengio, Markus Anderljung
With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development.
Computers and Society
4 code implementations • 2 Mar 2020 • David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, Aaron Courville
Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world.
3 code implementations • 19 Nov 2018 • Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg
One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions.
no code implementations • ICLR 2019 • Alexandre Lacoste, Boris Oreshkin, Wonchang Chung, Thomas Boquet, Negar Rostamzadeh, David Krueger
The result is a rich and meaningful prior capable of few-shot learning on new tasks.
5 code implementations • ICML 2018 • Chin-wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF).
1 code implementation • 31 Jan 2018 • Joel Ruben Antony Moniz, David Krueger
We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory.
no code implementations • 13 Dec 2017 • Alexandre Lacoste, Thomas Boquet, Negar Rostamzadeh, Boris Oreshkin, Wonchang Chung, David Krueger
The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds.
no code implementations • ICLR 2018 • David Krueger, Chin-wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville
We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks.
2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.
6 code implementations • 3 Jun 2016 • David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron Courville, Chris Pal
We propose zoneout, a novel method for regularizing RNNs.
1 code implementation • 26 Nov 2015 • David Krueger, Roland Memisevic
We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms.
no code implementations • 30 Oct 2015 • Philip Bachman, David Krueger, Doina Precup
We investigate attention as the active pursuit of useful information.
19 code implementations • 30 Oct 2014 • Laurent Dinh, David Krueger, Yoshua Bengio
It is based on the idea that a good representation is one in which the data has a distribution that is easy to model.
Ranked #73 on Image Generation on CIFAR-10 (bits/dimension metric)
no code implementations • 13 Feb 2014 • Kishore Konda, Roland Memisevic, David Krueger
We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation.