no code implementations • 24 Jan 2024 • Matteo Alleman, Jack W Lindsey, Stefano Fusi
By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs.