Image-Based Localization Aided Indoor Pedestrian Trajectory Estimation Using Smartphones

Yan Zhou, Xianwei Zheng, Ruizhi Chen, Hanjiang Xiong, Sheng Guo
2018 Sensors  
Accurately determining pedestrian location in indoor environments using consumer smartphones is a significant step in the development of ubiquitous localization services. Many different map-matching methods have been combined with pedestrian dead reckoning (PDR) to achieve low-cost and bias-free pedestrian tracking. However, this works only in areas with dense map constraints and the error accumulates in open areas. In order to achieve reliable localization without map constraints, an improved
more » ... mage-based localization aided pedestrian trajectory estimation method is proposed in this paper. The image-based localization recovers the pose of the camera from the 2D-3D correspondences between the 2D image positions and the 3D points of the scene model, previously reconstructed by a structure-from-motion (SfM) pipeline. This enables us to determine the initial location and eliminate the accumulative error of PDR when an image is successfully registered. However, the image is not always registered since the traditional 2D-to-3D matching rejects more and more correct matches when the scene becomes large. We thus adopt a robust image registration strategy that recovers initially unregistered images by integrating 3D-to-2D search. In the process, the visibility and co-visibility information is adopted to improve the efficiency when searching for the correspondences from both sides. The performance of the proposed method was evaluated through several experiments and the results demonstrate that it can offer highly acceptable pedestrian localization results in long-term tracking, with an error of only 0.56 m, without the need for dedicated infrastructures. gyroscope and visual odometer in GPS-challenging indoor spaces [10] and serve as a low-cost and high-accuracy solution in ubiquitous indoor localization. Vision-based localization has two main approaches, which are the simultaneous localization and mapping (SLAM) approaches such as Google Tango and image-based localization. Compared with SLAM [11], reconstructing the scene model in advance and opening the camera for localization only when lost is a more appropriate approach in indoor pedestrian localization [12] . The image-based localization result can be directly treated as the pedestrian location because people tend to carry their smartphone close to their body. Given the 2D image features and the 3D scene features, the camera pose can be estimated from the 2D-3D correspondences by applying an n-point pose solver inside a random sample consensus (RANSAC) loop [13] . Recent affordable or free structure-from-motion (SfM) software, such as Bundler [14] , VisualSfM [15] and Photoscan [16] , have allowed us to reconstruct indoor scenes and thus make it possible to undertake image-based localization in indoor environments. When combined with pedestrian dead reckoning (PDR) that estimates the distance and heading measurements of every step from the accelerometer and gyroscope embedded in the smartphone [17], discrete image-based localization can be interpolated to recover a continuous pedestrian trajectory. On the other hand, the relative positioning and the error accumulation of PDR can be remedied by the high-accuracy image localization result, by providing the initial position and regular correction when drifting. Therefore, with the 3D scene model provided, combining image-based localization and PDR can complement each other and achieve self-dependent and high-accuracy localization using only smart phones, without any extra equipment. Image-based localization was initially formulated as an image retrieval problem focused on matching a query image to an image database with geolocations [18] . When combined with the bag-of-visual-words model [19] , an image retrieval system is applicable to scalable scenes from the street-level [20], to the city-level [21] and to the worldwide-level [22] . Since the image database may contain thousands of millions of images, to efficiently retrieve and localize the query images, Li et al. [23] used an iconic scene graph to create a compact summary of the global images. Chen et al. [24] , on the other hand, improved the system's robustness to perspective views and hence the recall rate, by fusing the orthogonal and perspective street images to build synthetic views. Other improvements have focused on avoiding mismatches by dealing with repetitive scenes [25] and confusing scenes [26] . Compared to our method, the image retrieval strategy can only yield coarse location estimation. Furthermore, the raw images in the database are stored independently, with ignoring the underlying geometry [27] . In contrast to the pure image retrieval approach, SfM-based localization can obtain accurate pose estimation with exact orientation and position by correlating 2D features in a query image with 3D scene features in the model. Moreover, the SfM model presents a precise summary of the scene, with each 3D point triangulated from a trace of matched features and the noisy ones eliminated and not used for the matching. Consequently, it can accelerate the correspondence search by containing orders of magnitude fewer points than there are features in the images [28] . The most popular correspondence search algorithm is 2D-to-3D matching that directly uses the 2D descriptors as the query features to search for the corresponding 3D scene features based on the approximate nearest neighbor. This is followed by the use of Lowe's ratio test [29] to eliminate the ambiguous matches. However, the Lowe's ratio test tends to reject more and more correct matches as too ambiguous for larger scenes since the descriptor space defined by the 3D points becomes denser [13] . Therefore, the 3D-to-2D approach, which inversely matches the 3D points in the model against the 2D features in the image, is adopted to register images. The ratio test of the 3D-to-2D algorithm is not sensitive to large scenes as the descriptor space remains relatively constant and is not negatively affected by the density of the 3D model. The efficiency is affected however, when the scenes become larger. At the core of correct SfM-based localization is the robust estimation of accurate 2D-3D matches. Due to large viewpoint changes and repetitive textures, using the above-mentioned correspondence search algorithm alone may fail to register an image affected by a high outlier ratio. In order to
doi:10.3390/s18010258 pmid:29342123 pmcid:PMC5795839 fatcat:3474ax677zffzd6jilxmny7q7q