no code implementations • 30 Jan 2023 • Hélène Plisnier, Denis Steckelmacher, Jeroen Willems, Bruno Depraetere, Ann Nowé
Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession.
no code implementations • 18 Jul 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik Roijers, Ann Nowé
After training in the lab, the robot should be able to get by without the expensive equipment that used to be available to it, and yet still be guaranteed to perform well on the field.
1 code implementation • 11 Mar 2019 • Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé
We argue that actor-critic algorithms are limited by their need for an on-policy critic.
no code implementations • 7 Feb 2019 • Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé
In this paper, we propose an elegant solution, the Actor-Advisor architecture, in which a Policy Gradient actor learns from unbiased Monte-Carlo returns, while being shaped (or advised) by the Softmax policy arising from an off-policy critic.
no code implementations • 13 Aug 2018 • Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé
Our technique, Directed Policy Gradient (DPG), allows a teacher or backup policy to override the agent before it acts undesirably, while allowing the agent to leverage human advice or directives to learn faster.
no code implementations • 22 Aug 2017 • Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability.