An efficient framework for visible-infrared cross modality person re-identification

Emrah Basaran, Muhittin Gökmen, Mustafa E. Kamasak
2020 Signal processing. Image communication  
A B S T R A C T Visible-infrared cross-modality person re-identification (VI-ReId) is an essential task for video surveillance in poorly illuminated or dark environments. Despite many recent studies on person re-identification in the visible domain (ReId), there are few studies dealing specifically with VI-ReId. Besides challenges that are common for both ReId and VI-ReId such as pose/illumination variations, background clutter and occlusion, VI-ReId has additional challenges as color
more » ... n is not available in infrared images. As a result, the performance of VI-ReId systems is typically lower than that of ReId systems. In this work, we propose a four-stream framework to improve VI-ReId performance. We train a separate deep convolutional neural network in each stream using different representations of input images. We expect that different and complementary features can be learned from each stream. In our framework, grayscale and infrared input images are used to train the ResNet in the first stream. In the second stream, RGB and three-channel infrared images (created by repeating the infrared channel) are used. In the remaining two streams, we use local pattern maps as input images. These maps are generated utilizing local Zernike moments transformation. Local pattern maps are obtained from grayscale and infrared images in the third stream and from RGB and three-channel infrared images in the last stream. We improve the performance of the proposed framework by employing a re-ranking algorithm for post-processing. Our results indicate that the proposed framework outperforms current state-of-the-art with a large margin by improving Rank-1/mAP by 29.79%∕30.91% on SYSU-MM01 dataset, and by 9.73%∕16.36% on RegDB dataset. (E. Basaran). Most of the surveillance cameras used at night or in the dark usually operate in infrared mode in order to cope with poor illumination. Therefore, matching the person images captured by visible and infrared cameras is an important issue for video surveillance or miscellaneous applications. This issue is studied in literature as visible-infrared crossmodality person re-identification (VI-ReId) [1] [2] [3] [4] [5] . VI-ReId is the problem of retrieving the images of a person from a gallery set consisting of RGB (or infrared) images, given an infrared (or RGB) query image. For ReId, one of the most important cues in person images is obtained from the color. Therefore, the lack of color information in infrared images makes VI-ReId a very challenging problem. In this paper, we show that ResNet [6] architectures trained with the use of RGB and infrared images together can outperform the current state-of-the-art. These architectures are widely used for image classification and other computer vision problems. They can learn the common feature representations for RGB and infrared images of the same individual as well as the distinctive properties between the individuals better than the existing methods proposed for VI-ReId. In this study, we introduce a four-stream framework built with ResNet architectures. There is no weight sharing between the ResNets in the framework, and https://doi.
doi:10.1016/j.image.2020.115933 fatcat:2jbxbuw3dnfyzdpym3sibgvye4