Search Results for author: Gal Vardi

Found 21 papers, 2 papers with code

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

no code implementations • 4 Oct 2023 • Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu

Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training.

Paper
Add Code

Noisy Interpolation Learning with Shallow Univariate ReLU Networks

no code implementations • 28 Jul 2023 • Nirmit Joshi, Gal Vardi, Nathan Srebro

We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et.

regression

Paper
Add Code

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

no code implementations • 22 Jun 2023 • Lijia Zhou, James B. Simon, Gal Vardi, Nathan Srebro

We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model.

regression

Paper
Add Code

Reconstructing Training Data from Multiclass Neural Networks

no code implementations • 5 May 2023 • Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Michal Irani

Reconstructing samples from the training set of trained neural networks is a major privacy concern.

Binary Classification

Paper
Add Code

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.

Paper
Add Code

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

no code implementations • 13 Oct 2022 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.

Vocal Bursts Intensity Prediction

Paper
Add Code

On the Implicit Bias in Deep-Learning Algorithms

no code implementations • 26 Aug 2022 • Gal Vardi

Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples.

Paper
Add Code

Reconstructing Training Data from Trained Neural Networks

1 code implementation • 15 Jun 2022 • Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani

We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.

Paper
Code

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

no code implementations • 18 May 2022 • Itay Safran, Gal Vardi, Jason D. Lee

We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting.

Binary Classification

Paper
Add Code

The Sample Complexity of One-Hidden-Layer Neural Networks

no code implementations • 13 Feb 2022 • Gal Vardi, Ohad Shamir, Nathan Srebro

We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.

Paper
Add Code

Gradient Methods Provably Converge to Non-Robust Networks

no code implementations • 9 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir

Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples.

Paper
Add Code

Width is Less Important than Depth in ReLU Neural Networks

no code implementations • 8 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor.

Open-Ended Question Answering

Paper
Add Code

Implicit Regularization Towards Rank Minimization in ReLU Networks

no code implementations • 30 Jan 2022 • Nadav Timor, Gal Vardi, Ohad Shamir

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices.

Paper
Add Code

On the Optimal Memorization Power of ReLU Neural Networks

no code implementations • ICLR 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

Memorization

Paper
Add Code

On Margin Maximization in Linear and ReLU Networks

no code implementations • 6 Oct 2021 • Gal Vardi, Ohad Shamir, Nathan Srebro

The implicit bias of neural networks has been extensively studied in recent years.

Paper
Add Code

Learning a Single Neuron with Bias Using Gradient Descent

no code implementations • NeurIPS 2021 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent.

Paper
Add Code

Size and Depth Separation in Approximating Benign Functions with Neural Networks

no code implementations • 30 Jan 2021 • Gal Vardi, Daniel Reichman, Toniann Pitassi, Ohad Shamir

We show a complexity-theoretic barrier to proving such results beyond size $O(d\log^2(d))$, but also show an explicit benign function, that can be approximated with networks of size $O(d)$ and not with networks of size $o(d/\log d)$.

Paper
Add Code

From Local Pseudorandom Generators to Hardness of Learning

no code implementations • 20 Jan 2021 • Amit Daniely, Gal Vardi

We also establish lower bounds on the complexity of learning intersections of a constant number of halfspaces, and ReLU networks with a constant number of hidden neurons.

PAC learning

Paper
Add Code

Implicit Regularization in ReLU Networks with the Square Loss

1 code implementation • 9 Dec 2020 • Gal Vardi, Ohad Shamir

For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018].

Paper
Code

Hardness of Learning Neural Networks with Natural Weights

no code implementations • NeurIPS 2020 • Amit Daniely, Gal Vardi

A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning.

Paper
Add Code

Neural Networks with Small Weights and Depth-Separation Barriers

no code implementations • NeurIPS 2020 • Gal Vardi, Ohad Shamir

To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks.

Open-Ended Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.