no code implementations • 4 Oct 2023 • Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu
Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training.
no code implementations • 28 Jul 2023 • Nirmit Joshi, Gal Vardi, Nathan Srebro
We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et.
no code implementations • 22 Jun 2023 • Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro
We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model.
no code implementations • 5 May 2023 • Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Michal Irani
Reconstructing samples from the training set of trained neural networks is a major privacy concern.
no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.
no code implementations • 13 Oct 2022 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.
no code implementations • 26 Aug 2022 • Gal Vardi
Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples.
1 code implementation • 15 Jun 2022 • Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani
We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
no code implementations • 18 May 2022 • Itay Safran, Gal Vardi, Jason D. Lee
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting.
no code implementations • 13 Feb 2022 • Gal Vardi, Ohad Shamir, Nathan Srebro
We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
no code implementations • 9 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir
Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples.
no code implementations • 8 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir
We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor.
no code implementations • 30 Jan 2022 • Nadav Timor, Gal Vardi, Ohad Shamir
We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices.
no code implementations • ICLR 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
no code implementations • 6 Oct 2021 • Gal Vardi, Ohad Shamir, Nathan Srebro
The implicit bias of neural networks has been extensively studied in recent years.
no code implementations • NeurIPS 2021 • Gal Vardi, Gilad Yehudai, Ohad Shamir
We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent.
no code implementations • 30 Jan 2021 • Gal Vardi, Daniel Reichman, Toniann Pitassi, Ohad Shamir
We show a complexity-theoretic barrier to proving such results beyond size $O(d\log^2(d))$, but also show an explicit benign function, that can be approximated with networks of size $O(d)$ and not with networks of size $o(d/\log d)$.
no code implementations • 20 Jan 2021 • Amit Daniely, Gal Vardi
We also establish lower bounds on the complexity of learning intersections of a constant number of halfspaces, and ReLU networks with a constant number of hidden neurons.
1 code implementation • 9 Dec 2020 • Gal Vardi, Ohad Shamir
For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018].
no code implementations • NeurIPS 2020 • Amit Daniely, Gal Vardi
A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning.
no code implementations • NeurIPS 2020 • Gal Vardi, Ohad Shamir
To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks.