How does topology of neural architectures impact gradient propagation and model performance?
DenseNets introduce concatenation-type skip connections that achieve state-of-the-art accuracy in several computer vision tasks. In this paper, we reveal that the topology of the concatenation-type skip connections is closely related to the gradient propagation which, in turn, enables a predictable behavior of DNNs’ test performance. To this end, we introduce a new metric called NN-Mass to quantify how effectively information flows through DNNs. Moreover, we empirically show that NN-Mass also works for other types of skip connections, e.g., for ResNets, Wide-ResNets (WRNs), and MobileNets, which contain addition-type skip connections (i.e., residuals or inverted residuals). As such, for both DenseNet-like CNNs and ResNets/WRNs/MobileNets, our theoretically grounded NN-Mass can identify models with similar accuracy, despite having significantly different size/compute requirements. Detailed experiments on both synthetic and real datasets (e.g., MNIST, CIFAR-10, CIFAR100, ImageNet) provide extensive evidence for our insights. Finally, the closed-form equation of our NN-Mass enables us to design significantly compressed DenseNets (for CIFAR10) and MobileNets (for ImageNet) directly at initialization without time-consuming training and/or searching.
PDF AbstractTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Neural Architecture Search | CIFAR-10 | NN-MASS- CIFAR-A | Top-1 Error Rate | 3.0% | # 34 | |
Search Time (GPU days) | 0 | # 1 | ||||
Parameters | 5.02M | # 37 | ||||
FLOPS | 1.95G | # 2 | ||||
Neural Architecture Search | CIFAR-10 | NN-MASS- CIFAR-C | Top-1 Error Rate | 3.18% | # 36 | |
Search Time (GPU days) | 0 | # 1 | ||||
Parameters | 3.82M | # 32 | ||||
FLOPS | 1.2G | # 2 | ||||
Neural Architecture Search | ImageNet | NN-MASS-B | Top-1 Error Rate | 26.7 | # 123 | |
Accuracy | 73.3 | # 100 | ||||
FLOPs | 393M | # 118 | ||||
Params | 3.7M | # 55 | ||||
MACs | 393M | # 111 | ||||
Neural Architecture Search | ImageNet | NN-MASS-A | Top-1 Error Rate | 27.1 | # 126 | |
Accuracy | 72.9 | # 103 | ||||
FLOPs | 200M | # 109 | ||||
Params | 2.3M | # 59 | ||||
MACs | 200M | # 70 |