no code implementations • 9 Dec 2021 • Agustin Castellano, Hancheng Min, Juan Bazerque, Enrique Mallada
We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint.
no code implementations • 18 May 2021 • Agustin Castellano, Hancheng Min, Juan Bazerque, Enrique Mallada
Our analysis further highlights a trade-off between the time lag for the underlying MDP necessary to detect unsafe actions, and the level of exposure to unsafe events.
no code implementations • 24 Dec 2020 • Agustin Castellano, Juan Bazerque, Enrique Mallada
We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets.
no code implementations • 1 Oct 2020 • Agustin Castellano, Juan Bazerque, Enrique Mallada
More precisely, by defining a handicap metric that counts the number of unsafe actions, we provide an algorithm for discarding unsafe machines (or actions), with probability one, that achieves constant handicap.