no code implementations • 30 May 2024 • Wooseong Cho, TaeHyun Hwang, Joongkyu Lee, Min-hwan Oh
For our first algorithm, $\texttt{RRL-MNL}$, we adapt optimistic sampling to ensure the optimism of the estimated value function with sufficient frequency and establish that $\texttt{RRL-MNL}$ is both statistically and computationally efficient, achieving a $\tilde{O}(\kappa^{-1} d^{\frac{3}{2}} H^{\frac{3}{2}} \sqrt{T})$ frequentist regret bound with constant-time computational cost per episode.
no code implementations • 16 May 2024 • Joongkyu Lee, Min-hwan Oh
To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
no code implementations • 8 Feb 2024 • Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh
In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate policy learning through extended actions.