no code implementations • 9 Jan 2023 • Fengyin Li, Yuqiang Li, Xianyi Wu
Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon MDPs.
no code implementations • 20 Aug 2018 • Wenqing Bao, Xiaoqiang Cai, Xianyi Wu
This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms evolving in continuous time.