1 code implementation • 4 May 2023 • Hongyi Wang, Saurabh Agarwal, Pongsakorn U-chupala, Yoshiki Tanaka, Eric P. Xing, Dimitris Papailiopoulos
Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank (i. e., an approximation of the true rank) of each layer stabilizes at a constant value.
no code implementations • 13 Nov 2018 • Hiroaki Mikami, Hisahiro Suganuma, Pongsakorn U-chupala, Yoshiki Tanaka, Yuichi Kageyama
Scaling the distributed deep learning to a massive GPU cluster level is challenging due to the instability of the large mini-batch training and the overhead of the gradient synchronization.