no code implementations • 4 Mar 2024 • Damien Teney, Armand Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad
Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks.