no code implementations • 2 Feb 2024 • Tanishq Kumar, Kevin Luo, Mark Sellke
We put forward a theoretical explanation for this, based on the model's effective parameter count, $p_\text{eff}$, given by the sum of the number of non-zero weights in the final network and the mutual information between the sparsity mask and the data.
no code implementations • 3 Jun 2023 • Xiao-Yue Gong, Mark Sellke
For fixed budget, we show the asymptotically optimal sample complexity as $\delta\to 0$ is $c^{-1}\log(1/\delta)\big(\log\log(1/\delta)\big)^2$ to leading order.
no code implementations • 3 Jun 2023 • Mark Sellke
We study the sample complexity of learning ReLU neural networks from the point of view of generalization.
no code implementations • 3 Jun 2023 • Mark Sellke
We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible.
no code implementations • 10 Jun 2022 • Sitan Chen, Brice Huang, Jerry Li, Allen Liu, Mark Sellke
We give an adaptive algorithm that outputs a state which is $\gamma$-close in infidelity to $\rho$ using only $\tilde{O}(d^3/\gamma)$ copies, which is optimal for incoherent measurements.
no code implementations • 19 Feb 2022 • Allen Liu, Mark Sellke
We ask whether it is possible to obtain optimal instance-dependent regret $\tilde{O}(1/\Delta)$ where $\Delta$ is the gap between the $m$-th and $m+1$-st best arms.
no code implementations • 18 Jun 2021 • Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
no code implementations • NeurIPS 2021 • Sébastien Bubeck, Mark Sellke
Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied.
no code implementations • 23 Nov 2020 • Josh Alman, Timothy Chu, Gary Miller, Shyam Narayanan, Mark Sellke, Zhao Song
This completes the theory of Manhattan to Manhattan metric transforms initiated by Assouad in 1980.
no code implementations • 8 Nov 2020 • Sébastien Bubeck, Thomas Budzinski, Mark Sellke
We consider the cooperative multi-player version of the stochastic multi-armed bandit problem.
no code implementations • 29 Oct 2020 • Ahmed El Alaoui, Mark Sellke
In this paper we design an efficient algorithm which, given oracle access to the solution of the Parisi variational principle, exploits this conjectured FRSB structure for $\kappa<0$ and outputs a vector $\hat{\sigma}$ satisfying $\langle g_a , \hat{\sigma}\rangle \ge \kappa \sqrt{N}$ for all $1\le a \le M$ and lying on a sphere of non-trivial radius $\sqrt{\bar{q} N}$, where $\bar{q} \in (0, 1)$ is the right-end of the support of the associated Parisi measure.
Probability Data Structures and Algorithms Mathematical Physics Mathematical Physics
no code implementations • 15 Apr 2020 • Sébastien Bubeck, Yuval Rabani, Mark Sellke
We introduce the problem of $k$-chasing of convex functions, a simultaneous generalization of both the famous k-server problem in $R^d$, and of the problem of chasing convex bodies and functions.
no code implementations • 3 Feb 2020 • Mark Sellke, Aleksandrs Slivkins
The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.
no code implementations • 28 Apr 2019 • Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke
We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem.
no code implementations • 2 Feb 2019 • Sébastien Bubeck, Mark Sellke
Second we replace the entropy over combinatorial actions by a coordinate entropy, which allows us to obtain the first optimal worst-case bound for Thompson Sampling in the combinatorial setting.
no code implementations • 31 Oct 2017 • Boris Hanin, Mark Sellke
Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well?