no code implementations • 13 May 2021 • Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada
Firstly, we show that the squared loss converges exponentially to its optimum at a rate that depends on the level of imbalance and the margin of the initialization.
no code implementations • 1 Jan 2021 • Salma Tarmoun, Guilherme França, Benjamin David Haeffele, Rene Vidal
More precisely, gradient flow preserves the difference of the Gramian~matrices of the input and output weights and we show that the amount of acceleration depends on both the magnitude of that difference (which is fixed at initialization) and the spectrum of the data.
no code implementations • 1 Jan 2021 • Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada
In this paper, we present a novel analysis of overparametrized single-hidden layer linear networks, which formally connects initialization, optimization, and overparametrization with generalization performance.