Search Results for author: Mohammad Sadegh Talebi

Found 6 papers, 0 papers with code

Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm

no code implementations • NeurIPS 2020 • Lin Yang, Mohammad Hajiesmaili, Mohammad Sadegh Talebi, John C. S. Lui, Wing Shing Wong

We characterize the regret of ExpRb as a function of the corruption budget and show that for the case of a known corruption budget, the regret of ExpRb is tight.

Paper
Add Code

Improved Exploration in Factored Average-Reward MDPs

no code implementations • 9 Sep 2020 • Mohammad Sadegh Talebi, Anders Jonsson, Odalric-Ambrym Maillard

We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP).

Paper
Add Code

Tightening Exploration in Upper Confidence Reinforcement Learning

no code implementations • ICML 2020 • Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

In pursuit of practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities to compute confidence sets on the reward and (component-wise) transition distributions for each state-action pair.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

no code implementations • 9 Oct 2019 • Mahsa Asadi, Mohammad Sadegh Talebi, Hippolyte Bourel, Odalric-Ambrym Maillard

In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Learning Multiple Markov Chains via Adaptive Allocation

no code implementations • NeurIPS 2019 • Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain.

Paper
Add Code

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

no code implementations • 5 Mar 2018 • Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset.

LEMMA reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.