no code implementations • 13 May 2021 • Siddhartha Satpathi, R Srikant
We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function.