no code implementations • 30 May 2023 • Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant
The study of collaborative multi-agent bandits has attracted significant attention recently.
no code implementations • 23 Mar 2022 • Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant
Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting.
no code implementations • 28 Feb 2022 • Daniel Vial, Sanjay Shakkottai, R. Srikant
Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).
no code implementations • 12 Sep 2021 • Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant
(P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance.
no code implementations • 4 May 2021 • Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant
We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP).
no code implementations • 4 Dec 2020 • Daniel Vial, Sanjay Shakkottai, R. Srikant
We consider a variant of the traditional multi-armed bandit problem in which each arm is only able to provide one-bit feedback during each pull based on its past history of rewards.
no code implementations • 7 Jul 2020 • Daniel Vial, Sanjay Shakkottai, R. Srikant
Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret.
no code implementations • 18 Feb 2020 • Daniel Vial, Vijay Subramanian
We devise and analyze algorithms for the empirical policy evaluation problem in reinforcement learning.
1 code implementation • 4 Jun 2017 • Daniel Vial, Vijay Subramanian
We then show that the common underlying graph can be leveraged to efficiently and jointly estimate PPR for many pairs, rather than treating each pair separately using the primitive algorithm.
Social and Information Networks