no code implementations • ICML 2020 • Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen Mcaleer, Kagan Tumer
Training policies solely on the team-based reward is often difficult due to its sparsity.
1 code implementation • 19 Apr 2024 • Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen Mcaleer, Hau Chan, Bo An
Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks.
1 code implementation • 17 Apr 2024 • Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Mcaleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell
The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".
no code implementations • 4 Mar 2024 • Ariyan Bighashdel, Yongzhao Wang, Stephen Mcaleer, Rahul Savani, Frans A. Oliehoek
In game theory, a game refers to a model of interaction among rational decision-makers or players, making choices with the goal of achieving their individual objectives.
1 code implementation • 30 Jan 2024 • Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen Mcaleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken
Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously and collision-free through a shared area toward given goal locations.
no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.
4 code implementations • 16 Oct 2023 • Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Mcaleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck
We present Llemma, a large language model for mathematics.
Ranked #6 on Automated Theorem Proving on miniF2F-test
1 code implementation • 6 Oct 2023 • Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer
Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.
no code implementations • 9 Aug 2023 • Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang
This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi.
no code implementations • 22 Jul 2023 • Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Benjamin Eysenbach, Tuomas Sandholm, Furong Huang, Stephen Mcaleer
To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
1 code implementation • NeurIPS 2023 • Geunwoo Kim, Pierre Baldi, Stephen Mcaleer
We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function.
1 code implementation • 1 Mar 2023 • Chenguang Wang, Zhouliang Yu, Stephen Mcaleer, Tianshu Yu, Yaodong Yang
Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy.
no code implementations • 7 Feb 2023 • Lukas Schäfer, Oliver Slumbers, Stephen Mcaleer, Yali Du, Stefano V. Albrecht, David Mguni
In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to seamlessly extend value-based MARL algorithms with ensembles of value functions.
no code implementations • 5 Oct 2022 • Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel
Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.
1 code implementation • 16 Sep 2022 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.
no code implementations • 20 Jul 2022 • Tim Franzmeyer, Stephen Mcaleer, João F. Henriques, Jakob N. Foerster, Philip H. S. Torr, Adel Bibi, Christian Schroeder de Witt
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs.
no code implementations • 19 Jul 2022 • JB Lanier, Stephen Mcaleer, Pierre Baldi, Roy Fox
In this paper, we propose Feasible Adversarial Robust RL (FARR), a novel problem formulation and objective for automatically determining the set of environment parameter values over which to be robust.
no code implementations • 13 Jul 2022 • Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm
Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.
1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls
It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).
1 code implementation • 8 Jun 2022 • Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).
no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox
PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 6 Dec 2021 • Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.
1 code implementation • NeurIPS 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.
Multi-agent Reinforcement Learning Vocal Bursts Valence Prediction
no code implementations • 28 Oct 2021 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.
no code implementations • 20 Oct 2021 • Roy Fox, Stephen Mcaleer, Will Overman, Ioannis Panageas
Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well.
no code implementations • 7 Jun 2021 • Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox
Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.
1 code implementation • 4 Jun 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.
1 code implementation • 13 Mar 2021 • Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang
Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.
1 code implementation • NeurIPS 2021 • Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox
NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.
no code implementations • 8 Feb 2021 • Forest Agostinelli, Alexander Shmakov, Stephen Mcaleer, Roy Fox, Pierre Baldi
We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search.
no code implementations • 10 Nov 2020 • Stephen Mcaleer, Alex Fast, Yuntian Xue, Magdalene Seiler, William Tang, Mihaela Balu, Pierre Baldi, Andrew W. Browne
The skin dataset includes 550 images for each of the resolution levels.
2 code implementations • NeurIPS 2020 • Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi
We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.
no code implementations • 10 Dec 2019 • Alexander Shmakov, John Lanier, Stephen Mcaleer, Rohan Achar, Cristina Lopes, Pierre Baldi
Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games.
Multiagent Systems
no code implementations • 18 Jun 2019 • Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen Mcaleer, Kagan Tumer
Training policies solely on the team-based reward is often difficult due to its sparsity.
1 code implementation • 9 Jun 2019 • John B. Lanier, Stephen Mcaleer, Pierre Baldi
Dealing with sparse rewards is a longstanding challenge in reinforcement learning.
no code implementations • ICLR 2019 • Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi
Autodidactic Iteration is able to learn how to solve the Rubik’s Cube and the 15-puzzle without relying on human data.
9 code implementations • 18 May 2018 • Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi
A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision.