Search Results for author: Soufiane Hayou

Found 19 papers, 2 papers with code

How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse

no code implementations • 7 Apr 2024 • Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, Merouane Debbah

With the aim of rigorously understanding model collapse in language models, we consider in this paper a statistical model that allows us to characterize the impact of various recursive training scenarios.

Language Modelling

Paper
Add Code

LoRA+: Efficient Low Rank Adaptation of Large Models

1 code implementation • 19 Feb 2024 • Soufiane Hayou, Nikhil Ghosh, Bin Yu

In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension).

146

Paper
Code

Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks

no code implementations • 3 Oct 2023 • Greg Yang, Dingli Yu, Chen Zhu, Soufiane Hayou

By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $\mu$P, for *widthwise hyperparameter transfer*, i. e., predicting optimal hyperparameters of wide neural networks from narrow ones.

Paper
Add Code

Commutative Width and Depth Scaling in Deep Neural Networks

no code implementations • 2 Oct 2023 • Soufiane Hayou

Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i. e. the neural function tends to the same limit no matter how width and depth limits are taken.

Paper
Add Code

Leave-one-out Distinguishability in Machine Learning

1 code implementation • 29 Sep 2023 • Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri

We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD).

Gaussian Processes Memorization

Paper
Code

On the Connection Between Riemann Hypothesis and a Special Class of Neural Networks

no code implementations • 17 Sep 2023 • Soufiane Hayou

In this note, we revisit and extend an old analytic criterion of the RH known as the Nyman-Beurling criterion which connects the RH to a minimization problem that involves a special class of neural networks.

Paper
Add Code

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

no code implementations • 14 Feb 2023 • Fadhel Ayed, Soufiane Hayou

Data pruning algorithms are commonly used to reduce the memory and computational cost of the optimization process.

Paper
Add Code

Width and Depth Limits Commute in Residual Networks

no code implementations • 1 Feb 2023 • Soufiane Hayou, Greg Yang

We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken.

Paper
Add Code

On the infinite-depth limit of finite-width neural networks

no code implementations • 3 Oct 2022 • Soufiane Hayou

Unlike the infinite-width limit where the pre-activation converge weakly to a Gaussian random variable, we show that the infinite-depth limit yields different distributions depending on the choice of the activation function.

Paper
Add Code

From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

no code implementations • 22 Feb 2022 • Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li

Optimization and generalization are two essential aspects of statistical machine learning.

BIG-bench Machine Learning Generalization Bounds +1

Paper
Add Code

Feature Learning and Signal Propagation in Deep Neural Networks

no code implementations • 22 Oct 2021 • Yizhang Lou, Chris Mingard, Yoonsoo Nam, Soufiane Hayou

Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the euclidean product of the tangent features matrix and the data labels matrix).

Paper
Add Code

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

no code implementations • 22 Oct 2021 • Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.

L2 Regularization regression

Paper
Add Code

The Curse of Depth in Kernel Regime

no code implementations • NeurIPS Workshop ICBINB 2021 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau

Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK).

Paper
Add Code

Regularization in ResNet with Stochastic Depth

no code implementations • NeurIPS 2021 • Soufiane Hayou, Fadhel Ayed

Regularization plays a major role in modern deep learning.

Paper
Add Code

Stable ResNet

no code implementations • 24 Oct 2020 • Soufiane Hayou, Eugenio Clerico, Bobby He, George Deligiannidis, Arnaud Doucet, Judith Rousseau

Deep ResNet architectures have achieved state of the art performance on many tasks.

Paper
Add Code

Robust Pruning at Initialization

no code implementations • ICLR 2021 • Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh

Overparameterized Neural Networks (NN) display state-of-the-art performance.

Paper
Add Code

Mean-field Behaviour of Neural Tangent Kernel for Deep Neural Networks

no code implementations • 31 May 2019 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau

Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent in parameter space is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK).

Paper
Add Code

On the Impact of the Activation Function on Deep Neural Networks Training

no code implementations • 19 Feb 2019 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure.

Paper
Add Code

On the Selection of Initialization and Activation Function for Deep Neural Networks

no code implementations • ICLR 2019 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau

We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.