Response Modeling of Hyper-Parameters for Deep Convolution Neural Network

1 Jan 2021 · Mathieu Tuli, Mahdi S. Hosseini, Konstantinos N Plataniotis ·

Hyper-parameter optimization (HPO) is critical in training high performing Deep Neural Networks (DNN). Current methodologies fail to define an analytical response surface and remain a training bottleneck due to their use of additional internal hyper-parameters and lengthy evaluation cycles. We demonstrate that the low-rank factorization of the convolution weights of intermediate layers of a CNN can define an analytical response surface. We quantify how this surface acts as an auxiliary to optimizing training metrics. We introduce a dynamic tracking algorithm -- autoHyper -- that performs HPO on the order of hours for various datasets including ImageNet and requires no manual tuning. Our method -- using a single RTX2080Ti -- is able to select a learning rate within 59 hours for AdaM on ResNet34 applied to ImageNet and improves in testing accuracy by 4.93% over the default learning rate. In contrast to previous methods, we empirically prove that our algorithm and response surface generalize well across model, optimizer, and dataset selection removing the need for extensive domain knowledge to achieve high levels of performance.

PDF Abstract