175 Hits in 7.1 sec

Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection [article]

Xiurong Jiang, Lin Zhu, Yifan Hou, Hui Tian
2022 arXiv   pre-print
Then, through the attention-based feature interaction and serial multiscale dilated convolution (SDC) based feature fusion modules, the proposed model achieves the complementary interaction of low-level  ...  In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.  ...  RGB-T provides new ideas for many challenging computer vision tasks [10] , for example, RGB-T tracking, RGB-T crowd counting, and RGB-T person re-identification.  ... 
arXiv:2207.03558v1 fatcat:oyhitkbl75cgpgkignkywklwme

Siamese Network for RGB-D Salient Object Detection and Beyond [article]

Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, Ce Zhu
2021 arXiv   pre-print
Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each.  ...  We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models on the task of RGB-D SOD.  ...  fusing this support with the hierarchical multi-view features.  ... 
arXiv:2008.12134v2 fatcat:4dy5tf2yjngetox4f6x4fsau7q

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Khaled Bayoudh, Raja Knani, Fayçal Hamdaoui, Abdellatif Mtibaa
2021 The Visual Computer  
In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the  ...  integration and combination of heterogeneous visual cues across sensory modalities.  ...  ., RGB-D sensors, stereo, etc.) has encouraged the computer vision community to focus on combining the RGB modality with other sensing modalities.  ... 
doi:10.1007/s00371-021-02166-7 pmid:34131356 pmcid:PMC8192112 fatcat:jojwyc6slnevzk7eaiutlmlgfe

The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review

Jianghong Zhao, Yinrui Wang, Yuee Cao, Ming Guo, Xianfeng Huang, Ruiju Zhang, Xintong Dou, Xinyu Niu, Yuanyuan Cui, Jun Wang
2021 Remote Sensing  
Using 2D and 3D information fusion for the advantages of compensation and accuracy improvement has become a hot research topic.  ...  However, there are no critical reviews focusing on the fusion strategies of 2D and 3D information integration based on various data for segmentation and detection, which are the basic tasks of computer  ...  [82] for 2D Object Detection This is a large-scale, hierarchical multi-view RGB-D object dataset, which is acquired by the RGB-D camera.  ... 
doi:10.3390/rs13204029 fatcat:onnjeqvwb5gsjcrhdaq6hiekru

Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges [article]

Di Feng, Christian Haase-Schuetz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, Klaus Dietmayer
2020 arXiv   pre-print
In this context, many methods have been proposed for deep multi-modal perception problems.  ...  This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving.  ...  ACKNOWLEDGMENT We thank Fabian Duffhauss for collecting literature and reviewing the paper.  ... 
arXiv:1902.07830v4 fatcat:or6enjxktnamdmh2yekejjr4re

Multi-interactive Dual-decoder for RGB-thermal Salient Object Detection [article]

Zhengzheng Tu, Zhun Li, Chenglong Li, Yang Lang, Jin Tang
2021 arXiv   pre-print
In this paper, we propose a multi-interactive dual-decoder to mine and model the multi-type interactions for accurate RGBT SOD.  ...  In specific, we first encode two modalities into multi-level multi-modal feature representations.  ...  • We propose a unified model to seamlessly integrate the multi-type interactions for robust RGBT SOD.  ... 
arXiv:2005.02315v3 fatcat:ck2iowqlcfaw3orjykqoe42jha

Deep Learning for Face Anti-Spoofing: A Survey [article]

Zitong Yu, Yunxiao Qin, Xiaobai Li, Chenxu Zhao, Zhen Lei, Guoying Zhao
2022 arXiv   pre-print
RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors.  ...  It covers several novel and insightful components: 1) besides supervision with binary label (e.g., '0' for bonafide vs. '1' for PAs), we also investigate recent methods with pixel-wise supervision (e.g  ...  Representative multi-modal fusion and cross-modal translation approaches for FAS are collected in Table 13 (in Appendix). Multi-Modal Fusion.  ... 
arXiv:2106.14948v2 fatcat:wsheo7hbwvewhjoe6ykwjuqfii

DFTR: Depth-supervised Fusion Transformer for Salient Object Detection [article]

Heqin Zhu, Xu Sun, Yuexiang Li, Kai Ma, S. Kevin Zhou, Yefeng Zheng
2022 arXiv   pre-print
Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks. The code and model will be made publicly available.  ...  Specifically, we develop a Depth-supervised Fusion TRansformer (DFTR), to further improve the accuracy of both RGB and RGB-D SOD.  ...  In particular, the encoders of triplet Transformer module share weights for multi-level feature enhancement, while the three-stream decoder is individually initialized for multi-modal fusion.  ... 
arXiv:2203.06429v2 fatcat:re2cpc2fv5gbxkblm4zlhkkxhe

RGB-T Image Saliency Detection via Collaborative Graph Learning [article]

Zhengzheng Tu, Tian Xia, Chenglong Li, Xiaoxiao Wang, Yan Ma, Jin Tang
2019 arXiv   pre-print
Fusing complementary RGB and thermal infrared data has been proven to be effective for image saliency detection. In this paper, we propose an effective approach for RGB-T image saliency detection.  ...  Moreover, we contribute a more challenging dataset for the purpose of RGB-T image saliency detection, which contains 1000 spatially aligned RGB-T image pairs and their ground truth annotations.  ...  In [3] , they proposed a patch-based graph model to learn object feature presentation for RGB-T tracking, where the graph is optimized via weighted sparse representations that utilize multi-modality information  ... 
arXiv:1905.06741v1 fatcat:eyt4zazp3bgnzkfk7xllkmab2y

2021 Index IEEE Transactions on Image Processing Vol. 30

2021 IEEE Transactions on Image Processing  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  Nikan, S., +, TIP 2021 739-753 Biomedical MRI A Bilevel Integrated Model With Data-Driven Layer Ensemble for Multi-Modality Image Fusion.  ...  ., +, TIP 2021 4919-4931 Computerized tomography A Bilevel Integrated Model With Data-Driven Layer Ensemble for Multi-Modality Image Fusion.  ... 
doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Navigating an Automated Driving Vehicle via the Early Fusion of Multi-Modality

Malik Haris, Adam Glowacz
2022 Sensors  
This paper focuses on the early fusion of multi-modality and demonstrates how it outperforms a single modality using the CARLA simulator.  ...  The autonomous vehicle is equipped with a camera and active sensors, such as LiDAR and Radar, for safe navigation.  ...  Color images (RGB) and depth (D) are considered single modalities; however, RGB-D is considered multi-modal.  ... 
doi:10.3390/s22041425 pmid:35214327 pmcid:PMC8878300 fatcat:g7ti567yn5h6jbxbazzeoprnt4

Building Footprint Extraction From VHR Remote Sensing Images Combined With Normalized DSMs Using Fused Fully Convolutional Networks

Ksenia Bittner, Fathalrahman Adam, Shiyong Cui, Marco Korner, Peter Reinartz
2018 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
The inputs to the proposed Fused-FCN4s are three-band (RGB), panchromatic (PAN), and normalized digital surface model (nDSM) images.  ...  Recently developed fully convolutional networks (FCNs), which are similar to normal convolutional neural networks (CNNs), but the last fully connected layer is replaced by another convolution layer with  ...  ACKNOWLEDGMENTS The authors would like to thank Jiaojiao Tian for providing her software for normalized Digital Surface Model (nDSM) generation, Pablo d' Angelo for providing the WorldView-  ... 
doi:10.1109/jstars.2018.2849363 fatcat:2orfxlpumfhi3plw54hy7iw2ui

Visual Sensation and Perception Computational Models for Deep Learning: State of the art, Challenges and Prospects [article]

Bing Wei, Yudi Zhao, Kuangrong Hao, Lei Gao
2021 arXiv   pre-print
Through this survey, it will provide a comprehensive reference for research in this direction.  ...  Then, some points of view about the prospects of the visual perception computational models are presented.  ...  [11] integarated a top-down multi-modal fusion network with an attention-aware cross-modal cross-level fusion (ACCF) block, which successfully solve the problems of RGB-D salient objection detection  ... 
arXiv:2109.03391v1 fatcat:xtgda2x6azd2laun45tqfj77gi

Salient Object Detection With Lossless Feature Reflection and Weighted Structural Loss

Pingping Zhang, Wei Liu, Huchuan Lu, Chunhua Shen
2019 IEEE Transactions on Image Processing  
The location information, together with contextual and semantic information, of salient objects are jointly utilized to supervise the proposed network for more accurate saliency predictions.  ...  The coarse prediction results are effectively refined by these structural information for performance improvements.  ...  With the hierarchical fusion, the resulting SFCN (model (e)) improves the performance by about 2% leap.  ... 
doi:10.1109/tip.2019.2893535 fatcat:3hs7xomwjnhwzbinqyewo5fbdy

Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction [article]

Yikai Wang, Wenbing Huang, Fuchun Sun, Fengxiang He, Dacheng Tao
2021 arXiv   pre-print
Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods.  ...  For the application of dense image prediction, the validity of CEN is tested by four different scenarios: multimodal fusion, cycle multimodal fusion, multitask learning, and multimodal multitask learning  ...  “Deep surface normal estimation with hierarchical RGB-D fusion,” 1–8. 3 in CVPR, 2019. 1, 2, 7 [51] S. Chennupati, G. Sistu, S.  ... 
arXiv:2112.02252v1 fatcat:ul4gs5dajjc5lecol6psabn4pu
« Previous Showing results 1 — 15 out of 175 results