no code implementations • NeurIPS 2021 • Cem Kalkanli, Ayfer Ozgur
We show that Thompson sampling combined with an adaptive batching strategy can achieve a similar performance without knowing the time horizon $T$ of the problem and without having to carefully optimize the batch structure to achieve a target regret bound (i. e. problem dependent vs minimax regret) for a given $T$.
no code implementations • 1 Oct 2021 • Cem Kalkanli, Ayfer Ozgur
We study the asymptotic performance of the Thompson sampling algorithm in the batched multi-armed bandit setting where the time horizon $T$ is divided into batches, and the agent is not able to observe the rewards of her actions until the end of each batch.
no code implementations • 8 Nov 2020 • Cem Kalkanli, Ayfer Ozgur
Thompson sampling has been shown to be an effective policy across a variety of online learning tasks.