Search Results for author: Bernardo Avila Pires

Found 10 papers, 3 papers with code

Human Alignment of Large Language Models through Online Preference Optimisation

no code implementations • 13 Mar 2024 • Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.

Paper
Add Code

Understanding plasticity in neural networks

no code implementations • 2 Mar 2023 • Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Avila Pires, Razvan Pascanu, Will Dabney

Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems.

Atari Games

Paper
Add Code

Hierarchical Reinforcement Learning in Complex 3D Environments

no code implementations • 28 Feb 2023 • Bernardo Avila Pires, Feryal Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh

Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

BYOL-Explore: Exploration by Bootstrapped Prediction

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

Paper
Add Code

Neural Recursive Belief States in Multi-Agent Reinforcement Learning

no code implementations • 3 Feb 2021 • Pol Moreno, Edward Hughes, Kevin R. McKee, Bernardo Avila Pires, Théophane Weber

We also show that higher-order belief models outperform agents with lower-order models.

Decision Making Multi-agent Reinforcement Learning +2

Paper
Add Code

Geometric Entropic Exploration

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

Reinforcement Learning (RL)

Paper
Add Code

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

8 code implementations • NeurIPS 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Linear evaluation Representation Learning +1

3,094

Paper
Code

Bootstrap your own latent: A new approach to self-supervised Learning

31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k

Linear evaluation Representation Learning +4

12,829

Paper
Code

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

no code implementations • ICML 2020 • Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar

These latent embeddings are themselves trained to be predictive of the aforementioned representations.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

World Discovery Models

1 code implementation • 20 Feb 2019 • Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires, Jean-bastien Grill, Florent Altché, Rémi Munos

As humans we are driven by a strong desire for seeking novelty in our world.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.