Salient Object Detection and Segmentation in Video Surveillance

Siyue Yu
Video surveillance outputs different portrait information of scenes such as crime investigation, security system, automatic driving system, and environmental monitoring. Recently, deep learning based video surveillance is also an essential topic in computer vision. The specific tasks include object tracking, video object segmentation, salient object detection, and video salient object detection. Thus, this thesis studies salient object detection and segmentation in video surveillance, mainly on
more » ... video object segmentation and salient object detection. In video object segmentation, we study the case of given the first frame's mask and try to design a network that can adapt to different object appearance variations. Therefore, this thesis proposes a framework based on the non-local attention mechanism to localize and segment the target object in the current frame, referring to both the first frame with its given mask and the previous frame with its predicted mask. Our approach can achieve 86.5$\%$ IoU on DAVIS-2016 and 72.2$\%$ IoU on DAVIS-2017, with a speed of 0.11s per frame. Then for salient object detection, this thesis focuses on scribble annotations. However, scribbles fail to contain enough integral appearance information. To solve this problem. A local saliency coherence loss is proposed to assist partial cross-entropy loss and thereby help the network learn more complete object information. Further, A self-consist mechanism is designed to help the network not sensitive to different input scales. Our method can achieve comparable results compared with fully supervised methods. Our method achieves a new state-of-the-art performance on six benchmarks (e.g. for the ECSSD dataset: F_beta = 0.8995, E_xi = 0.9079 and MAE= 0.0489). Lastly, co-salient object detection is also studied. Recent methods explore both intra- and inter-image consistency through an attention mechanism. We find that existing attention mechanisms can only focus on limited related pixels. Thus, we propose a new framework with a self-contrastiv [...]
doi:10.17638/03164778 fatcat:saegwj3v2bd4dkflu2vghwbnfm