no code implementations • 3 Nov 2023 • Dayal Singh Kalra, Tianyu He, Maissam Barkeshli
In gradient descent dynamics of neural networks, the top eigenvalue of the Hessian of the loss (sharpness) displays a variety of robust phenomena throughout training.