Filters








10,141 Hits in 3.6 sec

Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking [article]

Yidi Li, Hong Liu, Hao Tang
2021 arXiv   pre-print
In this paper, we propose a novel Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities.  ...  Multi-modal fusion is proven to be an effective method to improve the accuracy and robustness of speaker tracking, especially in complex scenarios.  ...  Introduction Speaker tracking is the foundation task for intelligent systems to implement behavior analysis and human-computer interaction.  ... 
arXiv:2112.07423v1 fatcat:3ed73idqsbggxhpmdmamfh3dxe

Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Yue Zhao, Hui Wang, Qiang Ji
2012 International Journal of Advanced Robotic Systems  
Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of  ...  In this paper, we propose a Deep Dynamic Bayesian Network (DDBN) to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual  ...  input layer and the single modality features layer, one for audio data and one for visual data.  ... 
doi:10.5772/54000 fatcat:264bssoirfhn7jcn7whuqottjm

Sensor Fusion and Environmental Modelling for Multimodal Sentient Computing

Christopher Town, Zhigang Zhu
2007 2007 IEEE Conference on Computer Vision and Pattern Recognition  
Adaptive Multi-modal Fusion of Tracking Hypotheses The dynamic component of the world model benefits from a high-level fusion of the visual and ultrasonic modalities for robust multi-object tracking and  ...  The achieved spatial granularity is better than 3cm for > 95% of Bat observations (assuming only small motion) and Bats may be polled using radio base stations and a variable quality of service to give  ...  Adaptive Multi-modal Fusion of Tracking Hypotheses The dynamic component of the world model benefits from a high-level fusion of the visual and ultrasonic modalities for robust multi-object tracking and  ... 
doi:10.1109/cvpr.2007.383526 dblp:conf/cvpr/TownZ07 fatcat:dfkkliujlnfxdconr5su6infhm

Semantic Interpretation of Multi-Modal Human-Behaviour Data

Mehul Bhatt, Kristian Kersting
2017 Künstliche Intelligenz  
of largescale, dynamic, multi-modal sensory data, or data streams.  ...  multi-modal sensory data relevant to a range of application domains and problem contexts where interpreting human behaviour is central.  ...  We remain grateful to all contributing authors for their persistence and effort across the two review rounds that were undertaken for the preparation of this special issue.  ... 
doi:10.1007/s13218-017-0511-y fatcat:dkfdb2pxgrcafatws6ibf7hute

Multi-Modal Data Fusion Techniques and Applications [chapter]

Alessio Dore, Matteo Pinasco, Carlo S. Regazzoni
2009 Multi-Camera Networks  
The integration of heterogeneous sensors can provide complementary and redundant information that fused to visual cues allows the system to obtain an enriched and more robust scene interpretation.  ...  A discussion about possible architectures and algorithms is proposed showing, through systems examples, the benefits of the combination of other sensor typologies for camera network-based applications.  ...  Multi-modal Tracking using Bayesian Filtering Multisensor Data Fusion for target tracking is a very active and investigated domain because of its utility in a wide range of applications (for a survey on  ... 
doi:10.1016/b978-0-12-374633-7.00011-2 fatcat:drnoysj2fbaifcjdbiiukzbq6u

Multi-sensory and Multi-modal Fusion for Sentient Computing

Christopher Town
2006 International Journal of Computer Vision  
This paper presents an approach to multi-sensory and multi-modal fusion in which computer vision information obtained from calibrated cameras is integrated with a large-scale sentient computing system  ...  It is shown that the fusion process significantly enhances the capabilities and robustness of both sensory modalities, thus enabling the system to maintain a richer and more accurate world model. 1 From  ...  Acknowledgements The author gratefully acknowledges financial support from the Royal Commission for the Exhibition of 1851.  ... 
doi:10.1007/s11263-006-7834-8 fatcat:xmgx64sgsncpnjm6dlrxlrs6su

Preface: 3D GeoInfo 2021

L. Truong-Hong, F. Jia, E. Che, S. Emamgholian, D. Laefer, A. V. Vo
2021 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Topics included: 3D data creation and acquisition 3D data processing and analysis 3D data management - data quality, metadata, provenance and trust Data integration, information fusion, multi-modal data  ...  analysis 3D visualization, including gamification, virtual reality, augmented reality 3D and Artificial Intelligence/Machine Learning 3D and Big Data, parallel computing, cloud computing 3D city modeling  ...  • 3D data creation and acquisition • 3D data processing and analysis • 3D data management -data quality, metadata, provenance and trust • Data integration, information fusion, multi-modal data analysis  ... 
doi:10.5194/isprs-archives-xlvi-4-w4-2021-1-2021 fatcat:kvd7ez4se5eshhbes33wfeqxgi

Preface: 3D GeoInfo 2021

L. Truong-Hong, F. Jia, E. Che, S. Emamgholian, D. Laefer, A. V. Vo
2021 ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Topics included: 3D data creation and acquisition 3D data processing and analysis 3D data management - data quality, metadata, provenance and trust Data integration, information fusion, multi-modal data  ...  analysis 3D visualization, including gamification, virtual reality, augmented reality 3D and Artificial Intelligence/Machine Learning 3D and Big Data, parallel computing, cloud computing 3D city modeling  ...  • 3D data creation and acquisition • 3D data processing and analysis • 3D data management -data quality, metadata, provenance and trust • Data integration, information fusion, multi-modal data analysis  ... 
doi:10.5194/isprs-annals-viii-4-w2-2021-1-2021 fatcat:74th5oq5zvhoxd4npwby447fhm

Temporal Aggregation for Adaptive RGBT Tracking [article]

Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu
2022 arXiv   pre-print
performance.As for multi-modal tracking, constrained to the limited RGBT datasets, the adaptive fusion sub-network is appended to our method at the decision level to reflect the complementary characteristics  ...  In this paper, we propose an RGBT tracker which takes spatio-temporal clues into account for robust appearance model learning, and simultaneously, constructs an adaptive fusion sub-network for cross-modal  ...  Commonly, the key point of processing multi-modal data is located on the fusion of multi-modal representations.  ... 
arXiv:2201.08949v2 fatcat:shfvqqqixncltcbfaaofbhfi5a

Socializing Multimodal Sensors for Information Fusion

Yuhui Wang
2015 Proceedings of the 23rd ACM international conference on Multimedia - MM '15  
multimodal sensors and 3) a new social-cyberphysical paradigm where human and sensors are collaborating for event fusion.  ...  However, social sensor fusion for situation awareness is still in its infancy and lacks a unified framework to aggregate and composite real-time media streams from diverse sensors and social network platforms  ...  Moreover, multi-modal sensor fusion has been studied for combining multiple level of information for various multimedia tasks [1] .  ... 
doi:10.1145/2733373.2807995 dblp:conf/mm/Wang15 fatcat:7g7rgxylprfijfgi7wwhwilig4

A Survey for Deep RGBT Tracking [article]

Zhangyong Tang
2022 arXiv   pre-print
Visual object tracking with the visible (RGB) and thermal infrared (TIR) electromagnetic waves, shorted in RGBT tracking, recently draws increasing attention in the tracking community.  ...  This survey can be treated as a look-up-table for researchers who are concerned about RGBT tracking.  ...  For fusion at feature level, the multi-modal fusion locates between the feature extraction and similarity evaluation.  ... 
arXiv:2201.09296v2 fatcat:bcmxwou4bjgupdipfzoh33q3ta

Multi-modal biometrics for real-life person-specific emotional human-robot-interaction

Ahmad Rabie, Uwe Handmann
2014 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014)  
The robot can interact with its interaction partner emotionally by analyzing the emotions of her expressed either visually, facial expression, or auditive, speech prosody.  ...  The second way is the targetting of such systems for merely one person.  ...  In current classification fusion research, usually two types of multi-modal fusion strategies are applied namely feature level fusion and decision level fusion.  ... 
doi:10.1109/robio.2014.7090354 dblp:conf/robio/RabieH14 fatcat:x723wfn52vczletebx37gx7ibu

Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object Tracking [article]

Guangming Wang, Chensheng Peng, Jinpeng Zhang, Hesheng Wang
2022 arXiv   pre-print
Moreover, the ablation studies indicates the effectiveness of multi-scale feature fusion and pre-training on single modality.  ...  Specifically, through multi-scale interactive query and fusion between pixel-level and point-level features, our method, can obtain more distinguishing features to improve the performance of multiple object  ...  INTERACTIVE FEATURE FUSION FOR MULTI-OBJECT TRACKING A.  ... 
arXiv:2203.16268v1 fatcat:4egeregsjvavfgbrc2marm2wsm

Machine Perception and Learning Grand Challenge: Situational Intelligence Using Cross-Sensory Fusion

Shashi Phoha
2014 Frontiers in Robotics and AI  
In their 2013 paper, Blasch et al. (2013) survey recent research efforts to accommodate the effects of context in information fusion for target tracking applications.  ...  The notion of context itself is often incoherent and ill-defined across sensing modalities and applications: image processing research generally assumes only the visual scene to be the context for object  ...  FA9550-12-1-0270 and by the Office of Naval Research (ONR) under Grant No N00014-11-1-0893.  ... 
doi:10.3389/frobt.2014.00007 fatcat:4ihbm6yd3vb7tkwhsvmmwcmowu

Benchmark Driven Framework for Development of Emotion Sensing Support Systems

Senya Polikovsky, Maria Alejandra Quiros-Ramirez, Yoshinori Kameda, Yuichi Ohta, Judee Burgoon
2012 2012 European Intelligence and Security Informatics Conference  
This paper presents a new framework for the development of emotion sensing support systems that is a complete, easily extendible, flexible, and configurable environment with intensive benchmark capabilities  ...  It provides: 1) effective collaboration platform between technological and psychological researches, and 2) intensive benchmarking capabilities to test the performance of the entire system as well as individual  ...  It also provides a complete overview of all tracking results across time, useful for debugging and data visualization.  ... 
doi:10.1109/eisic.2012.27 dblp:conf/eisic/PolikovskyQKOB12 fatcat:owbugw6mn5gzvguu36bpzfmvbm
« Previous Showing results 1 — 15 out of 10,141 results