no code implementations • 31 May 2024 • Davide Paglieri, Saurabh Dash, Tim Rocktäschel, Jack Parker-Holder
The older OPT model, which much of the quantization literature is based on, shows significant performance deterioration and high susceptibility to outliers with varying calibration sets.
no code implementations • 27 Feb 2024 • Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans
Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.
no code implementations • 26 Feb 2024 • Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu
As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.
no code implementations • 23 Feb 2024 • Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos.
no code implementations • 24 Jan 2024 • Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel
In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial.
no code implementations • 14 Dec 2023 • Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning.
1 code implementation • NeurIPS 2023 • Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
1 code implementation • 21 Aug 2023 • Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel
As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment.
1 code implementation • NeurIPS 2023 • Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder
We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
no code implementations • 6 Mar 2023 • Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel
Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.
no code implementations • 18 Jan 2023 • Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei Zhang
Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL).
2 code implementations • 29 Nov 2022 • Samuel Kessler, Mateusz Ostaszewski, Michał Bortkiewicz, Mateusz Żarski, Maciej Wołczyk, Jack Parker-Holder, Stephen J. Roberts, Piotr Miłoś
World models power some of the most efficient reinforcement learning algorithms.
no code implementations • 23 Oct 2022 • Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette
We then present CASCADE, a novel approach for self-supervised exploration in this new setting.
2 code implementations • 23 Jul 2022 • Michael Matthews, Mikayel Samvelyan, Jack Parker-Holder, Edward Grefenstette, Tim Rocktäschel
In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards.
2 code implementations • 19 Jul 2022 • Xingchen Wan, Cong Lu, Jack Parker-Holder, Philip J. Ball, Vu Nguyen, Binxin Ru, Michael A. Osborne
Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to large performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly.
1 code implementation • 11 Jul 2022 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.
2 code implementations • 9 Jun 2022 • Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh
Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain.
1 code implementation • 22 Mar 2022 • Eric Hambro, Sharada Mohanty, Dmitrii Babaev, Minwoo Byeon, Dipam Chakraborty, Edward Grefenstette, Minqi Jiang, DaeJin Jo, Anssi Kanervisto, Jongmin Kim, Sungwoong Kim, Robert Kirk, Vitaly Kurin, Heinrich Küttler, Taehwon Kwon, Donghoon Lee, Vegard Mella, Nantas Nardelli, Ivan Nazarov, Nikita Ovsov, Jack Parker-Holder, Roberta Raileanu, Karolis Ramanauskas, Tim Rocktäschel, Danielle Rothermel, Mikayel Samvelyan, Dmitry Sorokin, Maciej Sypetkowski, Michał Sypetkowski
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge.
no code implementations • 8 Mar 2022 • Jaleh Zand, Jack Parker-Holder, Stephen J. Roberts
Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world.
3 code implementations • 2 Mar 2022 • Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
no code implementations • 11 Jan 2022 • Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer
The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents.
no code implementations • 24 Dec 2021 • Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster
We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points.
no code implementations • 4 Nov 2021 • Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano
Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains.
no code implementations • 8 Oct 2021 • Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne, Stephen J. Roberts
Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model.
4 code implementations • NeurIPS 2021 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.
no code implementations • 29 Sep 2021 • Jack Parker-Holder, Minqi Jiang, Michael D Dennis, Mikayel Samvelyan, Jakob Nicolaus Foerster, Edward Grefenstette, Tim Rocktäschel
Deep Reinforcement Learning (RL) has recently produced impressive results in a series of settings such as games and robotics.
1 code implementation • 27 Sep 2021 • Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel
By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.
no code implementations • NeurIPS Workshop ICBINB 2021 • Iryna Korshunova, Minqi Jiang, Jack Parker-Holder, Tim Rocktäschel, Edward Grefenstette
Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels.
1 code implementation • 16 Jul 2021 • Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten
In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way.
no code implementations • NeurIPS 2021 • Jack Parker-Holder, Vu Nguyen, Shaan Desai, Stephen Roberts
Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters.
1 code implementation • 5 Jun 2021 • Samuel Kessler, Jack Parker-Holder, Philip Ball, Stefan Zohren, Stephen J. Roberts
In this paper we formalize this "interference" as distinct from the problem of forgetting.
no code implementations • NeurIPS 2021 • Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, S Roberts
Offline reinforcement learning enables agents to make use of large pre-collected datasets of environment transitions and learn control policies without the need for potentially expensive or unsafe online data collection.
no code implementations • ICLR Workshop SSL-RL 2021 • Philip J. Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts
Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration.
no code implementations • 8 Feb 2021 • Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller
There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments.
2 code implementations • NeurIPS 2021 • Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control.
2 code implementations • 19 Jan 2021 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang
In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters.
no code implementations • NeurIPS 2020 • Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster
In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).
no code implementations • 21 Jun 2020 • Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts
The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL).
no code implementations • ICML Workshop LifelongML 2020 • Robert Müller, Jack Parker-Holder, Aldo Pacchiano
Meta-learning is a paradigm whereby an agent is trained with the specific goal of fast adaptation.
no code implementations • ICML 2020 • Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani
We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$.
no code implementations • ICML 2020 • Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks.
2 code implementations • NeurIPS 2020 • Jack Parker-Holder, Vu Nguyen, Stephen Roberts
A recent solution to this problem is Population Based Training (PBT) which updates both weights and hyperparameters in a single training run of a population of agents.
2 code implementations • NeurIPS 2020 • Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment.
no code implementations • 25 Sep 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.
no code implementations • 25 Sep 2019 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.
no code implementations • 10 Jul 2019 • Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Deepali Jain, Yuxiang Yang
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way.
1 code implementation • ICML 2020 • Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael. I. Jordan
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space.
no code implementations • 29 May 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
We propose a new class of structured methods for Monte Carlo (MC) sampling, called DPPMC, designed for high-dimensional nonisotropic distributions where samples are correlated to reduce the variance of the estimator via determinantal point processes.
1 code implementation • NeurIPS 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang
ASEBO adapts to the geometry of the function and learns optimal sets of sensing directions, which are used to probe it, on-the-fly.
no code implementations • 7 Mar 2019 • Krzysztof Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Deepali Jain, Yuxiang Yang, Atil Iscen, Jasmine Hsu, Vikas Sindhwani
Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics.
no code implementations • 9 Jan 2018 • Jack Parker-Holder, Sam Gass
The popularity of deep learning is increasing by the day.