How does topology of neural architectures impact gradient propagation and model performance?

DenseNets introduce concatenation-type skip connections that achieve state-of-the-art accuracy in several computer vision tasks. In this paper, we reveal that the topology of the concatenation-type skip connections is closely related to the gradient propagation which, in turn, enables a predictable behavior of DNNs’ test performance. To this end, we introduce a new metric called NN-Mass to quantify how effectively information flows through DNNs. Moreover, we empirically show that NN-Mass also works for other types of skip connections, e.g., for ResNets, Wide-ResNets (WRNs), and MobileNets, which contain addition-type skip connections (i.e., residuals or inverted residuals). As such, for both DenseNet-like CNNs and ResNets/WRNs/MobileNets, our theoretically grounded NN-Mass can identify models with similar accuracy, despite having significantly different size/compute requirements. Detailed experiments on both synthetic and real datasets (e.g., MNIST, CIFAR-10, CIFAR100, ImageNet) provide extensive evidence for our insights. Finally, the closed-form equation of our NN-Mass enables us to design significantly compressed DenseNets (for CIFAR10) and MobileNets (for ImageNet) directly at initialization without time-consuming training and/or searching.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Neural Architecture Search CIFAR-10 NN-MASS- CIFAR-A Top-1 Error Rate 3.0% # 34
Search Time (GPU days) 0 # 1
Parameters 5.02M # 37
FLOPS 1.95G # 2
Neural Architecture Search CIFAR-10 NN-MASS- CIFAR-C Top-1 Error Rate 3.18% # 36
Search Time (GPU days) 0 # 1
Parameters 3.82M # 32
FLOPS 1.2G # 2
Neural Architecture Search ImageNet NN-MASS-B Top-1 Error Rate 26.7 # 123
Accuracy 73.3 # 100
FLOPs 393M # 118
Params 3.7M # 55
MACs 393M # 111
Neural Architecture Search ImageNet NN-MASS-A Top-1 Error Rate 27.1 # 126
Accuracy 72.9 # 103
FLOPs 200M # 109
Params 2.3M # 59
MACs 200M # 70

Methods


No methods listed for this paper. Add relevant methods here