no code implementations • 31 Jan 2022 • Augusto Peres, Eduardo Dias, Luís Sarmento, Hugo Penedones
We propose a message passing neural network architecture designed to be equivariant to column and row permutations of a matrix.
no code implementations • NeurIPS 2019 • Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu
We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation.
no code implementations • 9 Jul 2018 • Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto
Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation.
no code implementations • 30 Dec 2016 • Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester
Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning.