Search Results for author: Jordi Grau-Moya

Found 23 papers, 6 papers with code

Grandmaster-Level Chess Without Search

no code implementations • 7 Feb 2024 • Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Tim Genewein

Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games.

Paper
Add Code

Learning Universal Predictors

1 code implementation • 26 Jan 2024 • Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data.

Meta-Learning

Paper
Code

Language Modeling Is Compression

1 code implementation • 19 Sep 2023 • Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning.

In-Context Learning Language Modelling

Paper
Code

Randomized Positional Encodings Boost Length Generalization of Transformers

1 code implementation • 26 May 2023 • Anian Ruoss, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness

Transformers have impressive generalization capabilities on tasks with a fixed context length.

Paper
Code

Memory-Based Meta-Learning on Non-Stationary Distributions

1 code implementation • 6 Feb 2023 • Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness

Memory-based meta-learning is a technique for approximating Bayes-optimal predictors.

Bayesian Inference Meta-Learning

Paper
Code

Beyond Bayes-optimality: meta-learning what you know you don't know

no code implementations • 30 Sep 2022 • Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Tim Genewein, Elliot Catt, Kevin Li, Anian Ruoss, Chris Cundy, Joel Veness, Jane Wang, Marcus Hutter, Christopher Summerfield, Shane Legg, Pedro Ortega

This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge.

Decision Making Meta-Learning

Paper
Add Code

Neural Networks and the Chomsky Hierarchy

2 code implementations • 5 Jul 2022 • Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega

Reliable generalization lies at the heart of safe ML and AI.

162

Paper
Code

Your Policy Regularizer is Secretly an Adversary

no code implementations • 23 Mar 2022 • Rob Brekelmans, Tim Genewein, Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Shane Legg, Pedro Ortega

Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy.

Paper
Add Code

Model-Free Risk-Sensitive Reinforcement Learning

no code implementations • 4 Nov 2021 • Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega

Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Decision Making reinforcement-learning +1

Paper
Add Code

Shaking the foundations: delusions in sequence models for interaction and control

no code implementations • 20 Oct 2021 • Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat, Tom Everitt, Corentin Tallec, Emilio Parisotto, Tom Erez, Yutian Chen, Scott Reed, Marcus Hutter, Nando de Freitas, Shane Legg

The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains.

counterfactual

Paper
Add Code

Stochastic Approximation of Gaussian Free Energy for Risk-Sensitive Reinforcement Learning

no code implementations • NeurIPS 2021 • Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A Ortega

Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Decision Making reinforcement-learning +1

Paper
Add Code

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

2 code implementations • 26 Mar 2021 • John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried

This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.

Model-based Reinforcement Learning reinforcement-learning +1

116

Paper
Code

Causal Analysis of Agent Behavior for AI Safety

no code implementations • 5 Mar 2021 • Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, Pedro A. Ortega

As machine learning systems become more powerful they also become increasingly unpredictable and opaque.

BIG-bench Machine Learning

Paper
Add Code

Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning

no code implementations • 11 Sep 2019 • Felix Leibfried, Jordi Grau-Moya

While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

no code implementations • NeurIPS 2019 • Felix Leibfried, Sergio Pascual-Diaz, Jordi Grau-Moya

In this paper, we investigate the use of empowerment in the presence of an extrinsic reward signal.

Reinforcement Learning (RL)

Paper
Add Code

Disentangled Skill Embeddings for Reinforcement Learning

no code implementations • 21 Jun 2019 • Janith C. Petangoda, Sergio Pascual-Diaz, Vincent Adam, Peter Vrancx, Jordi Grau-Moya

We propose a novel framework for multi-task reinforcement learning (MTRL).

Hierarchical Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Soft Q-Learning with Mutual-Information Regularization

no code implementations • ICLR 2019 • Jordi Grau-Moya, Felix Leibfried, Peter Vrancx

We show that the prior optimization introduces a mutual-information regularizer in the RL objective.

Decision Making Q-Learning +1

Paper
Add Code

Balancing Two-Player Stochastic Games with Soft Q-Learning

no code implementations • 9 Feb 2018 • Jordi Grau-Moya, Felix Leibfried, Haitham Bou-Ammar

Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers.

Q-Learning Reinforcement Learning (RL) +1

Paper
Add Code

An Information-Theoretic Optimality Principle for Deep Reinforcement Learning

no code implementations • 6 Aug 2017 • Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar

Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

no code implementations • 7 Apr 2016 • Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning.

Paper
Add Code

Adaptive information-theoretic bounded rational decision-making with parametric priors

no code implementations • 5 Nov 2015 • Jordi Grau-Moya, Daniel A. Braun

Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory.

Decision Making

Paper
Add Code

Bounded Rational Decision-Making in Changing Environments

no code implementations • 24 Dec 2013 • Jordi Grau-Moya, Daniel A. Braun

When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past.

Decision Making

Paper
Add Code

A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

no code implementations • NeurIPS 2012 • Pedro Ortega, Jordi Grau-Moya, Tim Genewein, David Balduzzi, Daniel Braun

We propose a novel Bayesian approach to solve stochastic optimization problems that involve ﬁnding extrema of noisy, nonlinear functions.

Stochastic Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.