Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-Directional Text-Image Matching

Xiaojing Yu, Tianlong Chen, Yang Yang, Michael Mugo, Zhangyang Wang
2019 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)  
Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for
more » ... odal person search. Built on a recent state-of-the-art Person Re-Id model [5] , BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline[24] on the newly-introduced WIDER Person Search dataset [1], boosting validation set performance by 9.01%(top-1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2 , respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.
doi:10.1109/iccvw.2019.00223 dblp:conf/iccvw/YuCYMW19 fatcat:tcizrmwbcvgnjbxnxe7l5zlh2y