Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

20 Jun 2020  ·  Ionut Cosmin Duta, Li Liu, Fan Zhu, Ling Shao ·

This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semantic Segmentation ADE20K PyConvSegNet-152 Validation mIoU 45.99 # 174
Test Score 56.52 # 2
Semantic Segmentation ADE20K val PyConvSegNet-152 mIoU 45.99 # 69
Pixel Accuracy 82.49 # 6
Image Classification ImageNet PyConvResNet-101 Top 1 Accuracy 81.49% # 585
Number of params 42.3M # 689
Hardware Burden None # 1
Operations per network pass None # 1

Methods