Filters








928 Hits in 4.5 sec

Versatile Multi-Modal Pre-Training for Human-Centric Perception [article]

Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu
2022 arXiv   pre-print
RGB, depth, 2D keypoints) for effective representation learning. The objective comes with two main challenges: dense pre-train for multi-modality data, efficient usage of sparse human priors.  ...  To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g.  ...  An Overview of HCMoCo. a) We present HCMoCo, a versatile multi-modal pre-training framework that takes multi-modal observations of human body as input for human-centric perception.  ... 
arXiv:2203.13815v1 fatcat:ncsu37gzfzg6xpdhzjt3na3zhi

Deep Learning for Scene Classification: A Survey [article]

Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, Li Liu
2021 arXiv   pre-print
Pre-trained CNNs, as fixed feature extractors, are divided into two categories: object-centric and scene-centric CNNs.  ...  Object-centric CNNs refer to the model pre-trained on object datasets, e.g., the ImageNet [56] , and deployed for scene classification.  ...  Currently, he is a Ph.D. student at the Center for Machine and Signal Analysis (CMVS) of the University of Oulu, Finland.  ... 
arXiv:2101.10531v2 fatcat:hwqw5so46ngxdlnfw7zynmpu6m

Multimodal Conversational AI: A Survey of Datasets and Approaches [article]

Anirudh Sundar, Larry Heck
2022 arXiv   pre-print
As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste).  ...  Multimodal expressions are central to conversations; a rich set of modalities amplify and often compensate for each other.  ...  ., 2020) , and text is encoded using Google News pre-trained word2vec (Mikolov et al., 2013) .  ... 
arXiv:2205.06907v1 fatcat:u6kehgeeq5aefdlvv5bpbwsvsa

Establishing human situation awareness using a multi-modal operator control unit in an urban search & rescue human-robot team

Benoit Larochelle, Geert-Jan M. Kruijff, Nanja Smets, Tina Mioch, Peter Groenewegen
2011 2011 RO-MAN  
Robots can potentially assist humans here, particularly when the hotzone is too dangerous for humans.  ...  Early on in a disaster it is crucial for humans to make an assessment of the situation, to help determine further action.  ...  The authors would also like to thank the end-user organizations, Vigili del Fuoco and FDDO, for their continuing support.  ... 
doi:10.1109/roman.2011.6005237 dblp:conf/ro-man/LarochelleKSMG11 fatcat:fkrz2osc6rcq3azqmgh57n4dwi

CVAE-H: Conditionalizing Variational Autoencoders via Hypernetworks and Trajectory Forecasting for Autonomous Driving [article]

Geunseob Oh, Huei Peng
2022 arXiv   pre-print
We first evaluate CVAE-H on simple generative experiments to show that CVAE-H is probabilistic, multi-modal, context-driven, and general.  ...  To best understand scene contexts and produce diverse possible future states of the road agents adaptively in different environments, a prediction model should be probabilistic, multi-modal, context-driven  ...  Secondly, not all probabilistic models are multi-modal, for instance, a uni-modal Gaussian. Third, to be fully context-driven, the model should leverage both the social and spatial information.  ... 
arXiv:2201.09874v1 fatcat:djtr2prpjva2tliyyaxvrn3vkq

Design, Implementation, and Evaluation of a Distance Learning Framework to Adapt to the Changing Landscape of Anatomy Instruction in Medical Education During COVID-19 Pandemic: A Proof-of-Concept Study

Nerissa Naidoo, Aida J. Azar, Amar Hassan Khamis, Mandana Gholami, Marjam Lindsbro, Alawi Alsheikh-Ali, Yajnavalka Banerjee
2021 Frontiers in Public Health  
Using Bourdieu's Theory of Practice, we showed that the DL-framework is an efficient pedagogical approach, pertinent for medical schools to adopt; and is versatile as it attests to the key domains of students  ...  In total, 70% students responded to the survey assessing perception toward DL (Kirkpatrick's Level: 1).  ...  These reflect that it is pertinent that medical schools avail DL modality to address students' learning needs, indicating the need for a robust and versatile DL-framework.  ... 
doi:10.3389/fpubh.2021.726814 pmid:34568264 pmcid:PMC8460872 fatcat:pnklvdtxvvg7bpdto4fac72ktu

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability  ...  With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  To simulate the intelligence of humans, it is necessary for models to train on large-scale multi-modal data.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

Multimodal representation models for prediction and control from partial information [article]

Martina Zambelli, Antoine Cully, Yiannis Demiris
2019 arXiv   pre-print
Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound.  ...  Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities.  ...  This architecture is trained with complete as well as partial data (see Table 1). Each uni-modal autoencoder can be trained separately, allowing for single modality learning.  ... 
arXiv:1910.03854v1 fatcat:jr4ub3lt4fcdjksynm7qto6vru

Team CERBERUS Wins the DARPA Subterranean Challenge: Technical Overview and Lessons Learned [article]

Marco Tranzatto, Mihir Dharmadhikari, Lukas Bernreiter, Marco Camurri, Shehryar Khattak, Frank Mascarich, Patrick Pfreundschuh, David Wisth, Samuel Zimmermann, Mihir Kulkarni, Victor Reijgwart, Benoit Casseau (+24 others)
2022 arXiv   pre-print
In response to this challenge, we developed the CERBERUS system which exploits the synergy of legged and flying robots, coupled with robust control especially for overcoming perilous terrain, multi-modal  ...  and multi-robot perception for localization and mapping in conditions of sensor degradation, and resilient autonomy through unified exploration path planning and local motion planning that reflects robot-specific  ...  We extend our gratitude to all the SubT Community and the DARPA team for the exciting challenge and collaborative community that was built.  ... 
arXiv:2207.04914v1 fatcat:lglpthomubdizfrf4n7mmw5jla

A Metaverse: taxonomy, components, applications, and open challenges

Sang-Min Park, Young-Gab Kim
2022 IEEE Access  
Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges.  ...  The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse.  ...  Other modalities tend to generate similar decoder representations and preserve more information in pre-trained text translation modules. Tang et al.  ... 
doi:10.1109/access.2021.3140175 fatcat:fnraeaz74vh33knfvhzrynesli

PyTorch Connectomics: A Scalable and Flexible Segmentation Framework for EM Connectomics [article]

Zudi Lin, Donglai Wei, Jeff Lichtman, Hanspeter Pfister
2021 arXiv   pre-print
of unlabeled data during training.  ...  Those functionalities can be easily realized in PyTC by changing the configuration options without coding and adapted to other 2D and 3D segmentation tasks for different tissues and imaging modalities.  ...  Our framework is not designed for specific imaging modality, data scale, or model architecture but versatile for different kinds of datasets and segmentation tasks, which distinguish us from previous works  ... 
arXiv:2112.05754v1 fatcat:jjdatdvbr5eypmxmqbvujmarc4

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens [article]

Elad Ben-Avraham, Roei Herzig, Karttikeya Mangalam, Amir Bar, Anna Rohrbach, Leonid Karlinsky, Trevor Darrell, Amir Globerson
2022 arXiv   pre-print
However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable.  ...  We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model.  ...  Several other methods have explored simultaneous image and video training in the context of multi-task [6] and multi-modal [26] learning.  ... 
arXiv:2206.06346v2 fatcat:7owq7r3gjrbt3lznndrh3kbavq

Music in Extended Realities

Luca Turchet, Rob Hamilton, Anil Camci
2021 IEEE Access  
Based on the results of the conducted review, a research agenda for the field is proposed.  ...  This article both surveys and expands upon the knowledge accumulated in existing research in this area to build a foundation for future works that bring together Music and XR.  ...  We briefly describe relatively common sensory modalities explored in Musical XR projects as follows: 1) VISUAL Due in no small part to the physiological prominence human perception places on visual stimuli  ... 
doi:10.1109/access.2021.3052931 fatcat:5wprklpagnaunedqe75f2cjafe

Multimodal Human Hand Motion Sensing and Analysis -A Review

Yaxu Xue, Zhaojie Ju, Kui Xiang, Jing Chen, Honghai Liu
2018 IEEE Transactions on Cognitive and Developmental Systems  
Human hand motion analysis is an essential research topic in recent applications, especially for dexterous robot hand manipulation learning from human hand skills.  ...  for hand motion sensing, including contact-based and non-contact-based approaches, are discussed with comparisons with their pros and cons; then, the state-of-theart analysis methods are introduced, with  ...  Complex HHMs show more flexible and dexterous human in-hand operations, so it is more difficult to describe the process for multi-fingered manipulation [23] .  ... 
doi:10.1109/tcds.2018.2800167 fatcat:ojznwvn3gzg7rgnfq7yg3wf4jm

Decomposing NeRF for Editing via Feature Field Distillation [article]

Sosuke Kobayashi, Eiichi Matsumoto, Vincent Sitzmann
2022 arXiv   pre-print
Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training and enable  ...  However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional.  ...  Acknowledgements We thank Tsukasa Takagi, Toshiki Nakanishi, Hiroharu Kato, and Masaaki Fukuda for helpful feedback.  ... 
arXiv:2205.15585v1 fatcat:v2gj3vho5rblzomdbv4su7octi
« Previous Showing results 1 — 15 out of 928 results