no code implementations • 24 May 2023 • Wenjian Hao, Paulo C. Heredia, Bowen Huang, Zehui Lu, Zihao Liang, Shaoshuai Mou
This paper proposes a policy learning algorithm based on the Koopman operator theory and policy gradient approach, which seeks to approximate an unknown dynamical system and search for optimal policy simultaneously, using the observations gathered through interaction with the environment.