no code implementations • 17 Mar 2024 • Zixian Yang, Lei Ying
We prove that our proposed algorithm yields a sublinear regret $\tilde{O}(T^{5/6})$ and queue-length bound $\tilde{O}(T^{2/3})$, where $T$ is the time horizon.
no code implementations • 26 Feb 2024 • Kellen Kanarios, Qining Zhang, Lei Ying
In this paper, we study a best arm identification problem with dual objects.
no code implementations • 22 Dec 2023 • Honghao Wei, Xin Liu, Lei Ying
This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step.
no code implementations • 27 Sep 2023 • Zihan Zhou, Honghao Wei, Lei Ying
PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs an approximately optimal policy with a high probability at the end of learning; and (iii) PRI guarantees $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(H^4 \sqrt{SA}K^{\frac{4}{5}})$ under a model-free algorithm, where $H$ is the length of each episode, $S$ is the number of states, $A$ is the number of actions, and the total number of episodes during learning is $2K+\tilde{\cal O}(K^{0. 25}).$ We further present a matching lower via an example that shows under any online learning algorithm, there exists a well-separated CMDP instance such that either the regret or violation has to be $\Omega(H\sqrt{K}),$ which matches the upper bound by a polylogarithmic factor.
1 code implementation • 1 Jun 2023 • Ruizhong Qiu, Dingsu Wang, Lei Ying, H. Vincent Poor, Yifang Zhang, Hanghang Tong
They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters.
no code implementations • 10 Mar 2023 • Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, Xingyu Zhou
We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost).
no code implementations • 5 Feb 2023 • Xin Liu, Zixian Yang, Lei Ying
This subroutine also achieves the state-of-the-art regret and constraint violation bounds for constrained online convex optimization problems, which is of independent interest.
no code implementations • 26 Jan 2023 • Xian Yu, Lei Ying
Risk-sensitive reinforcement learning (RL) has become a popular tool to control the risk of uncertain outcomes and ensure reliable performance in various sequential decision-making problems.
no code implementations • 4 Jan 2023 • Kaiyi Ji, Lei Ying
In this paper, we provide a new solution using a distributed and data-driven bilevel optimization approach, where the lower level is a distributed network utility maximization (NUM) algorithm with concave surrogate utility functions, and the upper level is a data-driven learning algorithm to find the best surrogate utility functions that maximize the sum of true network utility.
no code implementations • 13 Dec 2022 • Xin Liu, Honghao Wei, Lei Ying
The proposed algorithm is distributed in two aspects: (i) the learned policy is a distributed policy that maps a local state of an agent to its local action and (ii) the learning/training is distributed, during which each agent updates its policy based on its own and neighbors' information.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 2 Sep 2022 • Zixian Yang, R. Srikant, Lei Ying
We prove that under our algorithm the asymptotic average queue length is bounded by one divided by the traffic slackness, which is order-wise optimal.
no code implementations • 27 May 2022 • Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying
Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations.
no code implementations • 26 May 2022 • Zixian Yang, Xin Liu, Lei Ying
To understand the exploration, exploitation, and engagement in these systems, we propose a new model, called MAB-A where "A" stands for abandonment and the abandonment probability depends on the current recommended item and the user's past experience (called state).
no code implementations • 13 Nov 2021 • Jueming Hu, Xuxi Yang, Weichang Wang, Peng Wei, Lei Ying, Yongming Liu
Obstacle avoidance for small unmanned aircraft is vital for the safety of future urban air mobility (UAM) and Unmanned Aircraft System (UAS) Traffic Management (UTM).
no code implementations • 3 Jun 2021 • Honghao Wei, Xin Liu, Lei Ying
This paper presents the first model-free, simulator-free reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation.
no code implementations • NeurIPS 2021 • Xin Liu, Bin Li, Pengyi Shi, Lei Ying
Thus, the overall computational complexity of our algorithm is similar to that of the linear UCB for unconstrained stochastic linear bandits.
no code implementations • 20 Oct 2020 • Xin Liu, Bin Li, Pengyi Shi, Lei Ying
This paper considers constrained online dispatching with unknown arrival, reward and constraint distributions.
2 code implementations • 4 Oct 2020 • Honghao Wei, Lei Ying
In this paper, we propose a new type of Actor, named forward-looking Actor or FORK for short, for Actor-Critic algorithms.
1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.
1 code implementation • NeurIPS 2019 • Harsh Gupta, R. Srikant, Lei Ying
We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC.
no code implementations • 4 Mar 2019 • Honghao Wei, Xiaohan Kang, Weina Wang, Lei Ying
The algorithm consists of an offline machine learning algorithm for learning the probabilistic information spreading model and an online optimal stopping algorithm to detect misinformation.
no code implementations • 3 Feb 2019 • R. Srikant, Lei Ying
We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i. e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE).
no code implementations • 6 Mar 2014 • Kai Zhu, Rui Wu, Lei Ying, R. Srikant
In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users).
no code implementations • 1 Oct 2013 • Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying
In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure.