no code implementations • 1 Jan 2021 • Zhaodong Chen, Zhao WeiQin, Lei Deng, Guoqi Li, Yuan Xie
Moreover, analysis on the activation's mean in the forward pass reveals that the self-normalization property gets weaker with larger fan-in of each layer, which explains the performance degradation on large benchmarks like ImageNet.