Sample Efficient Stochastic Policy Extragradient Algorithm for Zero-Sum Markov Game

ICLR 2022 · Ziyi Chen, Shaocong Ma, Yi Zhou ·

Two-player zero-sum Markov game is a fundamental problem in reinforcement learning and game theory. Although many algorithms have been proposed for solving zero-sum Markov games in the existing literature, they generally lack the desired and important features such as model-free, provably convergent, sample efficient, symmetric and private policy updates, etc. In this paper, we develop a fully decentralized stochastic policy extragradient algorithm with all these properties for solving zero-sum Markov games. In particular, our algorithm introduces multiple stochastic estimators to accurately estimate the value functions involved in the stochastic updates, and leverages entropy regularization to accelerate the convergence. Specifically, with a proper entropy-regularization parameter, we prove that the stochastic policy extragradient algorithm has a sample complexity of the order $\mathcal{O}(\frac{t_{\text{mix}}A_{\max}}{\mu_{\text{min}}\epsilon^{5.5}(1-\gamma)^{13.5}})$ for finding a solution that achieves $\epsilon$-Nash equilibrium duality gap. Such a sample complexity result substantially improves the state-of-the-art complexity results.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Entropy Regularization

Edit Social Preview

Sample Efficient Stochastic Policy Extragradient Algorithm for Zero-Sum Markov Game

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove