no code implementations • 9 Aug 2023 • Leo Benac, Sonali Parbhoo, Finale Doshi-Velez
Offline Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare and education, where the rewards are known and the transition dynamics $T$ must be estimated on the basis of batch data.
no code implementations • 28 Sep 2021 • Leo Benac, Frédéric Godin
Simulation experiments assess the performance of the arm selection algorithms based on the two novel estimation approaches, and such policies are shown to outperform naive benchmarks not taking non-stationarity into account.