Multimodal Stereo from Thermal Infrared and Visible Spectrum

Fernando Barrera
2014 ELCVIA Electronic Letters on Computer Vision and Image Analysis  
Recent advances in thermal infrared imaging (LWIR) has allowed its use in applications beyond of the military domain. Nowadays, this new family of sensors is included in different technical and scientific applications. They offer features that facilitate tasks, such as detection of pedestrians, hot spots, differences in temperature, among others, which can significantly improve the performance of a system where the persons are expected to play the principal role. For instance, video
more » ... , video surveillance applications, monitoring, and pedestrian detection. During the dissertation the next question is stated: Could a couple of sensors measuring different bands of the electromagnetic spectrum, as the visible and thermal infrared, be used to extract depth information? Although it is a complex question, we shows that a system of these characteristics is possible as well as their advantages, drawbacks, and potential opportunities. In this research an experimental study that compares different cost functions and matching approaches is performed, in order to build a multimodal stereovision system. Furthermore, the common problems in infrared/visible stereo, specially in the outdoor scenes are identified. Our framework summarizes the architecture of a generic stereo algorithm, at different levels: computational, functional, and structural, which can be extended toward high-level fusion (semantic) and high-order (prior). The proposed framework is intended to explore novel multimodal stereo matching approaches, going from sparse to dense representations (both disparity and depth maps). Moreover, context information is added in form of priors and assumptions. Finally, the dissertation shows a promissory way toward the integration of multiple sensors for recovering three-dimensional information. The dissertation covers the main aspects of a multimodal stereo system: camera setup, matching cost functions, and disparity computation. First part presents several experiments carry on with different camera configurations. As a tangible result, two multimodal datasets and their corresponding ground truth data were acquired and published. These datasets consist of: (i) thermal infrared and visible images in raw format as well as their rectified versions; (ii) disparity maps; (iii) 3D point clouds; (iv) hand annotated planar regions; (v) synthesized disparity maps; and (vi) labeled image regions (valid and occluded image regions). Up to our knowledge there are not similar datasets available for evaluation and comparisons. Second part presents a study of different matching cost functions proposed during this dissertation. Finally, two dense stereo matching algorithms for Correspondence to: Recommended for acceptance by ELCVIA
doi:10.5565/rev/elcvia.619 fatcat:5g5piin3gvfe7egqyyqorhoo6i