3D Scene Mesh from CNN Depth Predictions and Sparse Monocular SLAM

Tomoyuki Mukasa, Jiu Xu, Stenger Bjorn
2017 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)  
In this paper, we propose a novel framework for integrating geometrical measurements of monocular visual simultaneous localization and mapping (SLAM) and depth prediction using a convolutional neural network (CNN). In our framework, SLAM-measured sparse features and CNNpredicted dense depth maps are fused to obtain a more accurate dense 3D reconstruction including scale. We continuously update an initial 3D mesh by integrating accurately tracked sparse features points. Compared to prior work on
more » ... integrating SLAM and CNN estimates [26] , there are two main differences: Using a 3D mesh representation allows as-rigid-as-possible update transformations. We further propose a system architecture suitable for mobile devices, where feature tracking and CNN-based depth prediction modules are separated, and only the former is run on the device. We evaluate the framework by comparing the 3D reconstruction result with 3D measurements obtained using an RGBD sensor, showing a reduction in the mean residual error of 38% compared to CNN-based depth map prediction alone.
doi:10.1109/iccvw.2017.112 dblp:conf/iccvw/MukasaXS17 fatcat:tdnufowpo5hb3lr4p32aueuxfq