Search Results for author: Soufiane Hayou

Found 19 papers, 2 papers with code

How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse

no code implementations7 Apr 2024 Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, Merouane Debbah

With the aim of rigorously understanding model collapse in language models, we consider in this paper a statistical model that allows us to characterize the impact of various recursive training scenarios.

Language Modelling

LoRA+: Efficient Low Rank Adaptation of Large Models

1 code implementation19 Feb 2024 Soufiane Hayou, Nikhil Ghosh, Bin Yu

In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension).

Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks

no code implementations3 Oct 2023 Greg Yang, Dingli Yu, Chen Zhu, Soufiane Hayou

By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $\mu$P, for *widthwise hyperparameter transfer*, i. e., predicting optimal hyperparameters of wide neural networks from narrow ones.

Commutative Width and Depth Scaling in Deep Neural Networks

no code implementations2 Oct 2023 Soufiane Hayou

Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i. e. the neural function tends to the same limit no matter how width and depth limits are taken.

Leave-one-out Distinguishability in Machine Learning

1 code implementation29 Sep 2023 Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri

We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD).

Gaussian Processes Memorization

On the Connection Between Riemann Hypothesis and a Special Class of Neural Networks

no code implementations17 Sep 2023 Soufiane Hayou

In this note, we revisit and extend an old analytic criterion of the RH known as the Nyman-Beurling criterion which connects the RH to a minimization problem that involves a special class of neural networks.

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

no code implementations14 Feb 2023 Fadhel Ayed, Soufiane Hayou

Data pruning algorithms are commonly used to reduce the memory and computational cost of the optimization process.

Width and Depth Limits Commute in Residual Networks

no code implementations1 Feb 2023 Soufiane Hayou, Greg Yang

We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken.

On the infinite-depth limit of finite-width neural networks

no code implementations3 Oct 2022 Soufiane Hayou

Unlike the infinite-width limit where the pre-activation converge weakly to a Gaussian random variable, we show that the infinite-depth limit yields different distributions depending on the choice of the activation function.

Feature Learning and Signal Propagation in Deep Neural Networks

no code implementations22 Oct 2021 Yizhang Lou, Chris Mingard, Yoonsoo Nam, Soufiane Hayou

Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the euclidean product of the tangent features matrix and the data labels matrix).

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

no code implementations22 Oct 2021 Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.

L2 Regularization regression

The Curse of Depth in Kernel Regime

no code implementations NeurIPS Workshop ICBINB 2021 Soufiane Hayou, Arnaud Doucet, Judith Rousseau

Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK).

Stable ResNet

no code implementations24 Oct 2020 Soufiane Hayou, Eugenio Clerico, Bobby He, George Deligiannidis, Arnaud Doucet, Judith Rousseau

Deep ResNet architectures have achieved state of the art performance on many tasks.

Robust Pruning at Initialization

no code implementations ICLR 2021 Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh

Overparameterized Neural Networks (NN) display state-of-the-art performance.

Mean-field Behaviour of Neural Tangent Kernel for Deep Neural Networks

no code implementations31 May 2019 Soufiane Hayou, Arnaud Doucet, Judith Rousseau

Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent in parameter space is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK).

On the Impact of the Activation Function on Deep Neural Networks Training

no code implementations19 Feb 2019 Soufiane Hayou, Arnaud Doucet, Judith Rousseau

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure.

On the Selection of Initialization and Activation Function for Deep Neural Networks

no code implementations ICLR 2019 Soufiane Hayou, Arnaud Doucet, Judith Rousseau

We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos.

Cannot find the paper you are looking for? You can Submit a new open access paper.