no code implementations • 29 Aug 2020 • Kuangen Zhang, Jongwoo Lee, Zhimin Hou, Clarence W. de Silva, Chenglong Fu, Neville Hogan
This paper focuses on the latter because the structured policy is more intuitive and can inherit insights from previous model-based controllers.
no code implementations • 7 Feb 2020 • Zhimin Hou, Kuangen Zhang, Yi Wan, Dongyu Li, Chenglong Fu, Haoyong Yu
A common way to solve this problem, known as Mixture-of-Experts, is to represent the policy as the weighted sum of multiple components, where different components perform well on different parts of the state space.
1 code implementation • 22 Oct 2019 • Kuangen Zhang, Zhimin Hou, Clarence W. de Silva, Haoyong Yu, Chenglong Fu
However, the local minima caused by unsuitable rewards and the overestimation of the cumulative reward impede the maximization of the cumulative reward.