Filters








3,147 Hits in 8.0 sec

Short-Term Prediction and Multi-Camera Fusion on Semantic Grids [article]

Lukas Hoyer, Patrick Kesper, Anna Khoreva, Volker Fischer
2019 arXiv   pre-print
In particular, we provide a proof of concept for the spatio-temporal fusion of multiple camera sequences and short-term prediction in such an ER.  ...  We evaluate this representation on real-world sequences of the Cityscapes dataset and show that our architecture can make accurate predictions in complex sensor fusion scenarios and significantly outperforms  ...  To the best of our knowledge, we are the first to investigate a deep convolutional neural network for short-term prediction and multi-camera sequence fusion on semantic grids.  ... 
arXiv:1903.08960v2 fatcat:bgmoivvuufhopflyo6mewczs3y

FISHING Net: Future Inference of Semantic Heatmaps In Grids [article]

Noureldin Hendy, Cooper Sloan, Feng Tian, Pengfei Duan, Nick Charchut, Yuesong Xie, Chuang Wang, James Philbin
2020 arXiv   pre-print
In this work we predict short-term semantic grids but the framework can be extended to other tasks.  ...  In this work, we present an end-to-end pipeline that performs semantic segmentation and short term prediction using a top-down representation.  ...  We use a network for each of the k modalities to do short term prediction of semantic grids into the future.  ... 
arXiv:2006.09917v1 fatcat:nlpsa74odvgddlkielutn3zimu

Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [article]

Di Feng, Christian Haase-Schuetz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, Klaus Dietmayer
2020 arXiv   pre-print
To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research.  ...  This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving.  ...  We also thank Bill Beluch, Rainer Stal, Peter Möller and Ulrich Michael for their suggestions and inspiring discussions.  ... 
arXiv:1902.07830v4 fatcat:or6enjxktnamdmh2yekejjr4re

Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency

Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g. foreground masks, depth, color images, semantics etc. as supervision  ...  for learning single-view 3D prediction.  ...  This work was supported in part by Intel/NSF VEC award IIS-1539099, NSF Award IIS-1212798, and the Berkeley Fellowship to ST.  ... 
doi:10.1109/cvpr.2017.30 dblp:conf/cvpr/TulsianiZEM17 fatcat:kvvkkk75ljdj7fbzlwif4ik3na

EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection [article]

Zhe Liu, Tengteng Huang, Bingling Li, Xiwu Chen, Xi Wang, Xiang Bai
2022 arXiv   pre-print
In this paper, we propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion~(CB-Fusion) module and a Multi-Modal Consistency~(MC) loss.  ...  More concretely, the proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner, leading to more  ...  We roughly divide these existing fusion methods into three categories: multi-view fusion, voxel-based and image fusion, and point-based and image fusion. Multi-View Fusion Methods.  ... 
arXiv:2112.11088v3 fatcat:tnpyffousvhntbnbrcxv34jcni

Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [article]

Aditya Prakash, Kashyap Chitta, Andreas Geiger
2021 arXiv   pre-print
Therefore, we propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.  ...  In this work, we demonstrate that imitation learning policies based on existing sensor fusion methods under-perform in the presence of a high density of dynamic agents and complex scenarios, which require  ...  Furthermore, we observe even better performance on the short routes when replacing the independent feature extractors of image and LiDAR branches with multi-scale geometry-based fusion encoder.  ... 
arXiv:2104.09224v1 fatcat:au3nqx7kwfds7kqymx2jceq5ga

HDMapNet: An Online HD Map Construction and Evaluation Framework [article]

Qi Li, Yue Wang, Yilun Wang, Hang Zhao
2022 arXiv   pre-print
HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view.  ...  Of note, our camera-LiDAR fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics.  ...  We first conduct short-term temporal fusion by pasting feature maps of previous frames into current's according to ego poses. The feature maps are fused by max pooling and then fed into decoder.  ... 
arXiv:2107.06307v4 fatcat:dyzicwdkxjfjjlm3uavfg37i6i

Recurrent-OctoMap: Learning State-based Map Refinement for Long-Term Semantic Mapping with 3D-Lidar Data [article]

Li Sun and Zhi Yan and Anestis Zaganidis and Cheng Zhao and Tom Duckett
2018 arXiv   pre-print
Most existing semantic mapping approaches focus on improving semantic understanding of single frames, rather than 3D refinement of semantic maps (i.e. fusing semantic observations).  ...  This paper presents a novel semantic mapping approach, Recurrent-OctoMap, learned from long-term 3D Lidar data.  ...  Francois Pomerleau and team for generously sharing their data. We also thank NVIDIA Corporation for donating a high-power GPU on which this work was performed.  ... 
arXiv:1807.00925v2 fatcat:nxmi22tc7vhtfeynpxv7f5m4ee

Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency [article]

Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik
2017 arXiv   pre-print
We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g. foreground masks, depth, color images, semantics etc. as supervision  ...  for learning single-view 3D prediction.  ...  This work was supported in part by Intel/NSF VEC award IIS-1539099, NSF Award IIS-1212798, and the Berkeley Fellowship to ST.  ... 
arXiv:1704.06254v1 fatcat:xmzflirmizdx7a5iq62yq75k2m

Unifying Voxel-based Representation with Transformer for 3D Object Detection [article]

Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia
2022 arXiv   pre-print
It surpasses previous work in single- and multi-modality entries and achieves leading performance in the nuScenes test set with 69.7%, 55.1%, and 71.1% NDS for LiDAR, camera, and multi-modality inputs,  ...  Different from previous work, our approach preserves the voxel space without height compression to alleviate semantic ambiguity and enable spatial interactions.  ...  However, in the short term, the current technique could not solve all the corner cases and extreme situations. It may bring potential risk to the decision process in real-world autonomous systems.  ... 
arXiv:2206.00630v1 fatcat:uz6ej6qxzvaaxieqg2jkyjyfae

Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning

Abdelhak Loukkal, Yves Grandvalet, Tom Drummond, You Li
2021 2021 IEEE Winter Conference on Applications of Computer Vision (WACV)  
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not.  ...  The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.  ...  Also adopting conditional imitation learning, [29] uses multi-modal inputs and explores different fusion schemes.  ... 
doi:10.1109/wacv48630.2021.00010 fatcat:6rnbnw4l6bae3ltwapotkt6qnm

The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review

Jianghong Zhao, Yinrui Wang, Yuee Cao, Ming Guo, Xianfeng Huang, Ruiju Zhang, Xintong Dou, Xinyu Niu, Yuanyuan Cui, Jun Wang
2021 Remote Sensing  
However, there are no critical reviews focusing on the fusion strategies of 2D and 3D information integration based on various data for segmentation and detection, which are the basic tasks of computer  ...  Using 2D and 3D information fusion for the advantages of compensation and accuracy improvement has become a hot research topic.  ...  The long short-term memory layer and a long short-term memorized fusion layer are designed for the global context feature learning.  ... 
doi:10.3390/rs13204029 fatcat:onnjeqvwb5gsjcrhdaq6hiekru

Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning [article]

Abdelhak Loukkal
2020 arXiv   pre-print
To ease the prediction of OGMs in BEV from camera images, we introduce a novel scheme where the OGMs are first predicted as semantic masks in camera view and then warped in BEV using the homography between  ...  Grid Maps (OGMs).  ...  Also adopting conditional imitation learning, [29] uses multi-modal inputs and explores different fusion schemes.  ... 
arXiv:2008.04047v1 fatcat:qgv5b4rm55brbi67ymesk6l3lu

Integration of Multi-Camera Video Moving Objects and GIS

Xie, Wang, Liu, Mao, Wang
2019 ISPRS International Journal of Geo-Information  
To address the aforementioned drawbacks, on the basis of multi-camera video moving object extraction, this paper first analyzed the characteristics of different video-GIS Information fusion methods and  ...  This work discusses the integration of multi-camera video moving objects (MCVO) and GIS.  ...  the video object trajectory and sub-graph data, thereby resolving the defects of the grid interface of multi-camera videos.  ... 
doi:10.3390/ijgi8120561 fatcat:qi2db3coxzablaezqmovrgtfhi

Panoptic Multi-TSDFs: a Flexible Representation for Online Multi-resolution Volumetric Mapping and Long-term Dynamic Scene Consistency [article]

Lukas Schmid, Jeffrey Delmerico, Johannes Schönberger, Juan Nieto, Marc Pollefeys, Roland Siegwart, Cesar Cadena
2022 arXiv   pre-print
Through reasoning on the object level, semantic consistency over time is achieved.  ...  For robotic interaction in environments shared with other agents, access to volumetric and semantic maps of the scene is crucial.  ...  They are segmented based on motion cues or semantic segmentation and reconstructed using surfel fusion.  ... 
arXiv:2109.10165v2 fatcat:hcpgw3z6pjbj3ijqa4hjr6x7em
« Previous Showing results 1 — 15 out of 3,147 results