no code implementations • 24 Feb 2024 • Yassir Jedra, William Réveillard, Stefan Stojanovic, Alexandre Proutiere
For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal.
Multi-Armed Bandits