no code implementations • 7 Feb 2024 • Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Tim Genewein
Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games.
1 code implementation • 26 Jan 2024 • Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness
Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.
1 code implementation • 19 Sep 2023 • Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness
We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.
1 code implementation • 26 May 2023 • Anian Ruoss, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness
Transformers have impressive generalization capabilities on tasks with a fixed context length.
1 code implementation • 6 Feb 2023 • Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness
Memory-based meta-learning is a technique for approximating Bayes-optimal predictors.
no code implementations • 30 Sep 2022 • Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Tim Genewein, Elliot Catt, Kevin Li, Anian Ruoss, Chris Cundy, Joel Veness, Jane Wang, Marcus Hutter, Christopher Summerfield, Shane Legg, Pedro Ortega
This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge.
2 code implementations • 5 Jul 2022 • Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega
Reliable generalization lies at the heart of safe ML and AI.
no code implementations • 23 Mar 2022 • Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro Ortega
Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy.
no code implementations • 4 Nov 2021 • Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega
Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.
no code implementations • 20 Oct 2021 • Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg
The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains.
no code implementations • NeurIPS 2021 • Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A Ortega
Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.
2 code implementations • 26 Mar 2021 • John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried
This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 5 Mar 2021 • Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, Pedro A. Ortega
As machine learning systems become more powerful they also become increasingly unpredictable and opaque.
no code implementations • 11 Sep 2019 • Felix Leibfried, Jordi Grau-Moya
While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators.
no code implementations • NeurIPS 2019 • Felix Leibfried, Sergio Pascual-Diaz, Jordi Grau-Moya
In this paper, we investigate the use of empowerment in the presence of an extrinsic reward signal.
no code implementations • 21 Jun 2019 • Janith C. Petangoda, Sergio Pascual-Diaz, Vincent Adam, Peter Vrancx, Jordi Grau-Moya
We propose a novel framework for multi-task reinforcement learning (MTRL).
Hierarchical Reinforcement Learning reinforcement-learning +2
no code implementations • ICLR 2019 • Jordi Grau-Moya, Felix Leibfried, Peter Vrancx
We show that the prior optimization introduces a mutual-information regularizer in the RL objective.
no code implementations • 9 Feb 2018 • Jordi Grau-Moya, Felix Leibfried, Haitham Bou-Ammar
Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers.
no code implementations • 6 Aug 2017 • Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar
Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly.
no code implementations • 7 Apr 2016 • Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun
As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning.
no code implementations • 5 Nov 2015 • Jordi Grau-Moya, Daniel A. Braun
Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory.
no code implementations • 24 Dec 2013 • Jordi Grau-Moya, Daniel A. Braun
When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past.
no code implementations • NeurIPS 2012 • Pedro Ortega, Jordi Grau-Moya, Tim Genewein, David Balduzzi, Daniel Braun
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions.