Learning sparse DNNs with soft thresholding of weights during training

29 Sep 2021 · Antoine Vanderschueren, Christophe De Vleeschouwer ·

This paper proposes a new and simple way of training sparse neural networks. Our method is based on a differentiation of the forward and backward paths: the weights in the forward path are a thresholded version of the weights maintained in the backward path. This decoupling allows for micro-updates, produced by gradient descent, to stack up, leading to the possible re-activation of weights that were set to zero in earlier training steps. At the end of training, links with zero weights are pruned away. Additional critical specificities of our approach lie (i) in the progressive increase of the zeroed weight ratio along the training, and (ii) in the use of soft-thresholding rather than hard-tresholding to derive the forward-path weights from the ones maintained in the backward path. At constant accuracy, our approach reduces the number of training cycles to 1 compared to the state-of-the-art recursive pruning methods. At high pruning rates, it also improves the model accuracy compared to other single cycle pruning approaches (66.18% top-1 accuracy when training a ResNet-50 on ImageNet at 98% sparsity).

PDF Abstract