1 code implementation • NeurIPS 2023 • Brahma S. Pavse, Josiah P. Hanna
Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE.
1 code implementation • 2 Jun 2023 • Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna
This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.
no code implementations • 14 Dec 2022 • Brahma S. Pavse, Josiah P. Hanna
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$.
no code implementations • 18 Jun 2019 • Brahma S. Pavse, Faraz Torabi, Josiah P. Hanna, Garrett Warnell, Peter Stone
Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch.