no code implementations • 6 Feb 2024 • Brett Daley, Martha White, Marlos C. Machado
Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods.
1 code implementation • 26 Jan 2023 • Brett Daley, Martha White, Christopher Amato, Marlos C. Machado
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging.
no code implementations • 4 Jun 2022 • Brett Daley, Isaac Chan
Q($\sigma$) is a recently proposed temporal-difference learning method that interpolates between learning from expected backups and sampled backups.
no code implementations • 23 Dec 2021 • Brett Daley, Christopher Amato
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, particularly in the experience replay setting now commonly used with deep neural networks.
1 code implementation • 6 Dec 2021 • Brett Daley, Christopher Amato
Return caching is a recent strategy that enables efficient minibatch training with multistep estimators (e. g. the {\lambda}-return) for deep reinforcement learning.
1 code implementation • 1 Nov 2021 • Brett Daley, Christopher Amato
Deep Q-Network (DQN) marked a major milestone for reinforcement learning, demonstrating for the first time that human-level control policies could be learned directly from raw visual inputs via reward maximization.
no code implementations • 10 Jun 2021 • Brett Daley, Christopher Amato
Adam is an adaptive gradient method that has experienced widespread adoption due to its fast and reliable training performance.
no code implementations • 22 Feb 2021 • Brett Daley, Cameron Hickert, Christopher Amato
Our theory prescribes a special non-uniform distribution to cancel this effect, and we propose a stratified sampling scheme to efficiently implement it.
no code implementations • 8 Feb 2021 • Xueguang Lyu, Yuchen Xiao, Brett Daley, Christopher Amato
Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community.
1 code implementation • 19 Oct 2020 • Hai Nguyen, Brett Daley, Xinchao Song, Christopher Amato, Robert Platt
Many important robotics problems are partially observable in the sense that a single visual or force-feedback measurement is insufficient to reconstruct the state.
1 code implementation • 3 Oct 2020 • Brett Daley, Christopher Amato
Many popular adaptive gradient methods such as Adam and RMSProp rely on an exponential moving average (EMA) to normalize their stepsizes.
1 code implementation • NeurIPS 2019 • Brett Daley, Christopher Amato
Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context.
1 code implementation • 23 Oct 2018 • Brett Daley, Christopher Amato
Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the $\lambda$-return difficult in this context.