no code implementations • 13 Feb 2022 • Aymen Al Marjani, Tomáš Kocák, Aurélien Garivier
Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by \cite{Mason2020}.
no code implementations • NeurIPS 2021 • Aymen Al Marjani, Aurélien Garivier, Alexandre Proutiere
We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible.
no code implementations • 28 Sep 2020 • Aymen Al Marjani, Alexandre Proutiere
We then provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit.