no code implementations • ICML 2020 • Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi
In pursuit of practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities to compute confidence sets on the reward and (component-wise) transition distributions for each state-action pair.
no code implementations • 9 Oct 2019 • Mahsa Asadi, Mohammad Sadegh Talebi, Hippolyte Bourel, Odalric-Ambrym Maillard
In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.
Model-based Reinforcement Learning reinforcement-learning +1