1 code implementation • 27 Dec 2021 • Yue Zhu, Mingyu Cai, Chris Schwarz, Junchao Li, Shaoping Xiao
At first, the obtained optimal policy from PPO is compared to those from DQN and DDQN.
reinforcement-learning Reinforcement Learning (RL)