Search Results for author: Adam Ibrahim

Found 7 papers, 4 papers with code

Zamba: A Compact 7B SSM Hybrid Model

no code implementations • 26 May 2024 • Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge

Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay.

Paper
Add Code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation • 13 Mar 2024 • Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

6,632

Paper
Code

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations • 8 Aug 2023 • Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

6,632

Paper
Code

Towards Out-of-Distribution Adversarial Robustness

1 code implementation • 6 Oct 2022 • Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training.

Adversarial Robustness

Paper
Code

Learning Robust Kernel Ensembles with Kernel Average Pooling

no code implementations • 30 Sep 2022 • Pouya Bashivan, Adam Ibrahim, Amirozhan Dehghani, Yifei Ren

Model ensembles have long been used in machine learning to reduce the variance in individual model predictions, making them more robust to input perturbations.

Paper
Add Code

Adversarial Feature Desensitization

1 code implementation • NeurIPS 2021 • Pouya Bashivan, Reza Bayat, Adam Ibrahim, Kartik Ahuja, Mojtaba Faramarzi, Touraj Laleh, Blake Aaron Richards, Irina Rish

Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.

Adversarial Robustness Domain Adaptation +1

Paper
Code

Linear Lower Bounds and Conditioning of Differentiable Games

no code implementations • ICML 2020 • Adam Ibrahim, Waïss Azizian, Gauthier Gidel, Ioannis Mitliagkas

In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i. e. geometric) upper bounds observed in the literature on a wide class of problems.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.