no code implementations • 28 May 2024 • Longxiang He, Li Shen, Junbo Tan, Xueqian Wang
IDQL reinterprets IQL as an actor-critic method and gets weights of implicit policy, however, this weight only holds for the optimal value function.
1 code implementation • 9 Oct 2023 • Longxiang He, Li Shen, Linrui Zhang, Junbo Tan, Xueqian Wang
Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR).