no code implementations • 28 May 2024 • Jiaxiang Li, Siliang Zeng, Hoi-To Wai, Chenliang Li, Alfredo Garcia, Mingyi Hong
Moreover, we identify a connection between the proposed IRL based approach, and certain self-play approach proposed recently, and showed that self-play is a special case of modeling a reward-learning agent.
2 code implementations • 15 Sep 2023 • Ran Wei, Siliang Zeng, Chenliang Li, Alfredo Garcia, Anthony McDonald, Mingyi Hong
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL).
1 code implementation • NeurIPS 2023 • Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
no code implementations • 4 Oct 2022 • Siliang Zeng, Chenliang Li, Alfredo Garcia, Mingyi Hong
To reduce the computational burden of a nested loop, novel methods such as SQIL [1] and IQ-Learn [2] emphasize policy estimation at the expense of reward estimation accuracy.
no code implementations • 4 Oct 2022 • Siliang Zeng, Mingyi Hong, Alfredo Garcia
Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy.
no code implementations • 11 Oct 2021 • Siliang Zeng, Tianyi Chen, Alfredo Garcia, Mingyi Hong
The flexibility in our design allows the proposed MARL-CAC algorithm to be used in a {\it fully decentralized} setting, where the agents can only communicate with their neighbors, as well as a {\it federated} setting, where the agents occasionally communicate with a server while optimizing their (partially personalized) local models.
no code implementations • NeurIPS 2021 • Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang
We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth.
no code implementations • 20 Jun 2020 • Mingyi Hong, Siliang Zeng, Junyu Zhang, Haoran Sun
However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient.