Switching Convolutional Neural Network for Crowd Counting

We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence better performance compared to current state-of-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.

PDF Abstract CVPR 2017 PDF CVPR 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Crowd Counting ShanghaiTech A Switch-CNN MAE 90.4 # 28
Crowd Counting ShanghaiTech B Switch-CNN MAE 21.6 # 25
Crowd Counting UCF CC 50 Switch-CNN MAE 318.1 # 16
Crowd Counting WorldExpo’10 Switch-CNN Average MAE 9.4 # 11

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Crowd Counting UCF-QNRF Switch-CNN MAE 228 # 15
Crowd Counting Venice Switch-CNN MAE 52.8 # 4

Methods


No methods listed for this paper. Add relevant methods here