no code implementations • 30 Nov 2021 • Yunfei Teng, Jing Wang, Anna Choromanska
Modern deep learning (DL) architectures are trained using variants of the SGD algorithm that is run with a $\textit{manually}$ defined learning rate schedule, i. e., the learning rate is dropped at the pre-defined epochs, typically when the training loss is expected to saturate.
1 code implementation • 25 Nov 2020 • Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit Ram, Lior Horesh
We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions.
1 code implementation • NeurIPS 2019 • Yunfei Teng, Wenbo Gao, Francois Chalus, Anna Choromanska, Donald Goldfarb, Adrian Weller
Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers).
no code implementations • 10 Feb 2018 • Yunfei Teng, Anna Choromanska, Mariusz Bojarski
However, it does not explicitly enforce $F_{BA}$ to be an inverse operation to $F_{AB}$.