no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox
PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 7 Jun 2021 • Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox
Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.
1 code implementation • NeurIPS 2021 • Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox
NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.
2 code implementations • NeurIPS 2020 • Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi
We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.
no code implementations • 10 Dec 2019 • Alexander Shmakov, John Lanier, Stephen Mcaleer, Rohan Achar, Cristina Lopes, Pierre Baldi
Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games.
Multiagent Systems