Deep End-to-end 3D Person Detection from Camera and Lidar

Markus Roth, Dominik Jargot, Dariu M. Gavrila
2019 2019 IEEE Intelligent Transportation Systems Conference (ITSC)  
We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network. For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature
more » ... Encoders [1] to obtain point cloud features instead of widely used projectionbased point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner. Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty. †
doi:10.1109/itsc.2019.8917366 dblp:conf/itsc/RothJG19 fatcat:2kubr3qfg5hz7jyj3ntgdrzylq