1 code implementation • 13 Mar 2024 • Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish
In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.
no code implementations • 2 Dec 2023 • Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning.
2 code implementations • 8 Aug 2023 • Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort
We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.
no code implementations • 17 May 2023 • Benjamin Thérien, Chengjie Huang, Adrian Chow, Krzysztof Czarnecki
To our knowledge, we are the first to study object re-identification from real point cloud observations.
no code implementations • 5 Oct 2022 • Luke Rowe, Benjamin Thérien, Krzysztof Czarnecki, Hongyang Zhang
In adversarial machine learning, the popular $\ell_\infty$ threat model has been the focus of much previous work.
no code implementations • 3 Oct 2022 • Benjamin Thérien, Krzysztof Czarnecki
By enumerating different tracking decisions and associated reasoning procedures, we can train individual networks to reason about the possible decisions via IIT.
no code implementations • 29 Sep 2021 • Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Sajjad Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet.
1 code implementation • CVPR 2022 • Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf
The wavelet scattering transform creates geometric invariants and deformation stability.