no code implementations • 26 May 2024 • Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor
Using this setting, we introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates.
no code implementations • 11 Mar 2024 • Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor
We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).