no code implementations • 26 May 2024 • Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge
Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay.
1 code implementation • 13 Mar 2024 • Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish
In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.
2 code implementations • 8 Aug 2023 • Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort
We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.
1 code implementation • 6 Oct 2022 • Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan
Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training.
no code implementations • 30 Sep 2022 • Pouya Bashivan, Adam Ibrahim, Amirozhan Dehghani, Yifei Ren
Model ensembles have long been used in machine learning to reduce the variance in individual model predictions, making them more robust to input perturbations.
1 code implementation • NeurIPS 2021 • Pouya Bashivan, Reza Bayat, Adam Ibrahim, Kartik Ahuja, Mojtaba Faramarzi, Touraj Laleh, Blake Aaron Richards, Irina Rish
Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.
no code implementations • ICML 2020 • Adam Ibrahim, Waïss Azizian, Gauthier Gidel, Ioannis Mitliagkas
In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i. e. geometric) upper bounds observed in the literature on a wide class of problems.