no code implementations • 18 Jun 2022 • Ri Su, Alphonse Houssou Hounye, Cong Cao, Muzhou Hou
Based on the Sigmoid activation function of output layer, the linear addition activation value of parallel structures in the training process is easy to make the samples fall into the weak gradient interval, resulting in the phenomenon of weak gradient, and reducing the effectiveness of training.