1 code implementation • 7 Dec 2022 • Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael L. Littman
Reinforcement-learning agents seek to maximize a reward signal through environmental interactions.
no code implementations • 30 May 2022 • Henry Sowerby, Zhiyuan Zhou, Michael L. Littman
To solve this optimization problem, we propose a linear-programming based algorithm that efficiently finds a reward function that maximizes action gap and minimizes subjective discount.