no code implementations • 10 Sep 2023 • Fan Lu, Sean Meyn
The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning.
no code implementations • 6 Sep 2023 • Caio Kalil Lauand, Sean Meyn
The remaining results are established for linear SA recursions: (ii) the bivariate parameter-disturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within $O(\alpha)$ of optimal.
no code implementations • 5 Jul 2023 • Sean Meyn
The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.
no code implementations • 10 Jan 2023 • Gian Paramo, Arturo Bretas, Sean Meyn
This technique holds several advantages over contemporary techniques: It utilizes technology that is already deployed in the field, it offers a significant degree of generality, and so far it has displayed a very high-level of sensitivity without sacrificing accuracy.
no code implementations • 20 Dec 2022 • Austin Cooper, Arturo Bretas, Sean Meyn, Newton G. Bretas
This paper presents a model for detecting high-impedance faults (HIFs) using parameter error modeling and a two-step per-phase weighted least squares state estimation (SE) process.
no code implementations • 20 Dec 2022 • Austin Cooper, Arturo Bretas, Sean Meyn, Newton G. Bretas
Distribution systems of the future smart grid require enhancements to the reliability of distribution system state estimation (DSSE) in the face of low measurement redundancy, unsynchronized measurements, and dynamic load profiles.
no code implementations • 17 Oct 2022 • Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu
The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting.
no code implementations • 14 Oct 2022 • Fan Lu, Joel Mathias, Sean Meyn, Karanjit Kalsi
Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence, and the possibility of making use of greater a priori knowledge regarding policy or value function structure.
no code implementations • 27 Oct 2021 • Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn
In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $ \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)$, is globally asymptotically stable with stationary point denoted $\theta^*$, where $\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]$ with $\Phi$ having the stationary distribution of the chain.
no code implementations • 30 Sep 2020 • Shuhang Chen, Adithya Devraj, Andrey Bernstein, Sean Meyn
(ii) With gain $a_t = g/(1+t)$ the results are not as sharp: the rate of convergence $1/t$ holds only if $I + g A^*$ is Hurwitz.
no code implementations • 7 Feb 2020 • Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean Meyn
This is motivation for the focus on mean square error bounds for parameter estimates.
no code implementations • 17 Sep 2018 • Adithya M. Devraj, Ana Bušić, Sean Meyn
There are two well known SA techniques that are known to have optimal asymptotic variance: the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR).
no code implementations • NeurIPS 2017 • Adithya M. Devraj, Sean Meyn
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects.
no code implementations • 6 Jul 2013 • Wei Chen, Dayu Huang, Ankur A. Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, Adam Wierman
Neuro-dynamic programming is a class of powerful techniques for approximating the solution to dynamic programming equations.