EMPIRICAL UPPER BOUND IN OBJECT DETECTION

1 Jan 2021  ·  Ali Borji ·

Object detection remains one of the most notorious open problems in computer vision. Despite large strides in accuracy and speed in recent years, modern object detectors have started to saturate on popular benchmarks. How far can we push the detection accuracy with the current deep learning tools and tricks? In this work, by employing two popular state-of-the-art object detection benchmarks, MMDetection and Detectron2, and analyzing more than 15 models over 4 large-scale datasets, we systematically determine the upper bound in AP, which is 91.6% on PASCAL VOC (test2007), 78.2% on MS COCO (val2017), and 58.9% on OpenImages (V4 validation set), regardless of the IOU. These numbers are much higher than the mAP of the best model (e.g., 58% on MS COCO according to the most recent results). Interestingly, the gap seems to be almost closed at IOU=0.5. We also analyze the role of context in object recognition and detection and find that the canonical object size leads to the best recognition accuracy. Finally, we carefully characterize the sources of errors in deep object detectors and find that classification error (confusion with other classes and misses) explains the largest fraction of errors and weighs more than localization error. Further, models frequently miss small objects, more often than medium and large ones. Our work taps into the tight relationship between object recognition and detection and offers insights to build better object detectors. Similar analyses can also be conducted for other tasks in computer vision such as for instance segmentation and object tracking. The code is available at [TBA].

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here