no code implementations • 18 Apr 2024 • Ruofan Wu, Junmin Zhong, Jennie Si
We prove qualitative properties of PAAC for learning convergence of the value and policy, solution optimality, and stability of system dynamics.
no code implementations • 7 Nov 2023 • Junmin Zhong, Ruofan Wu, Jennie Si
We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method.
no code implementations • 10 Oct 2022 • Junmin Zhong, Ruofan Wu, Jennie Si
However, there is a lack of comprehensive and systematic study on this important aspect to demonstrate the effectiveness of multi-step methods in solving highly complex continuous control problems.