no code implementations • 2 May 2024 • Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde
We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards.
no code implementations • 16 Feb 2021 • Bingshan Hu, Zhiming Huang, Nishant A. Mehta
Specifically, for the problem of decision-theoretic online learning with stochastic rewards, we present the first algorithm that achieves an $ O \left( \frac{ \log K}{ \Delta_{\min}} + \frac{\log(K) \min\{\log (\frac{1}{\Delta_{\min}}), \log(T)\}}{\epsilon} \right)$ regret bound, where $\Delta_{\min}$ is the minimum mean reward gap.
no code implementations • 14 May 2020 • Zhiming Huang, Yifan Xu, Bingshan Hu, QiPeng Wang, Jianping Pan
We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints~(CSMAB-F).
no code implementations • 8 Sep 2017 • Zhiming Huang, Lin Yang, Wen Jiang
Social dilemmas have been regarded as the essence of evolution game theory, in which the prisoner's dilemma game is the most famous metaphor for the problem of cooperation.