Person Search by Text Attribute Query As Zero-Shot Learning

ICCV 2019 · Qi Dong, Shaogang Gong, Xiatian Zhu ·

Existing person search methods predominantly assume the availability of at least one-shot imagery sample of the queried person. This assumption is limited in circumstances where only a brief textual (or verbal) description of the target person is available. In this work, we present a deep learning method for attribute text description based person search without any query imagery. Whilst conventional cross-modality matching methods, such as global visual-textual embedding based zero-shot learning and local individual attribute recognition, are functionally applicable, they are limited by several assumptions invalid to person search in deployment scale, data quality, and/or category name semantics. We overcome these issues by formulating an Attribute-Image Hierarchical Matching (AIHM) model. It is able to more reliably match text attribute descriptions with noisy surveillance person images by jointly learning global category-level and local attribute-level textual-visual embedding as well as matching. Extensive evaluations demonstrate the superiority of our AIHM model over a wide variety of state-of-the-art methods on three publicly available attribute labelled surveillance person search benchmarks: Market-1501, DukeMTMC, and PA100K.

PDF Abstract