A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Versatile Multi-Modal Pre-Training for Human-Centric Perception
[article]
2022
arXiv
pre-print
RGB, depth, 2D keypoints) for effective representation learning. The objective comes with two main challenges: dense pre-train for multi-modality data, efficient usage of sparse human priors. ...
To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g. ...
An Overview of HCMoCo. a) We present HCMoCo, a versatile multi-modal pre-training framework that takes multi-modal observations of human body as input for human-centric perception. ...
arXiv:2203.13815v1
fatcat:ncsu37gzfzg6xpdhzjt3na3zhi
Deep Learning for Scene Classification: A Survey
[article]
2021
arXiv
pre-print
Pre-trained CNNs, as fixed feature extractors, are divided into two categories: object-centric and scene-centric CNNs. ...
Object-centric CNNs refer to the model pre-trained on object datasets, e.g., the ImageNet [56] , and deployed for scene classification. ...
Currently, he is a Ph.D. student at the Center for Machine and Signal Analysis (CMVS) of the University of Oulu, Finland. ...
arXiv:2101.10531v2
fatcat:hwqw5so46ngxdlnfw7zynmpu6m
Multimodal Conversational AI: A Survey of Datasets and Approaches
[article]
2022
arXiv
pre-print
As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste). ...
Multimodal expressions are central to conversations; a rich set of modalities amplify and often compensate for each other. ...
., 2020) , and text is encoded using Google News pre-trained word2vec (Mikolov et al., 2013) . ...
arXiv:2205.06907v1
fatcat:u6kehgeeq5aefdlvv5bpbwsvsa
Establishing human situation awareness using a multi-modal operator control unit in an urban search & rescue human-robot team
2011
2011 RO-MAN
Robots can potentially assist humans here, particularly when the hotzone is too dangerous for humans. ...
Early on in a disaster it is crucial for humans to make an assessment of the situation, to help determine further action. ...
The authors would also like to thank the end-user organizations, Vigili del Fuoco and FDDO, for their continuing support. ...
doi:10.1109/roman.2011.6005237
dblp:conf/ro-man/LarochelleKSMG11
fatcat:fkrz2osc6rcq3azqmgh57n4dwi
CVAE-H: Conditionalizing Variational Autoencoders via Hypernetworks and Trajectory Forecasting for Autonomous Driving
[article]
2022
arXiv
pre-print
We first evaluate CVAE-H on simple generative experiments to show that CVAE-H is probabilistic, multi-modal, context-driven, and general. ...
To best understand scene contexts and produce diverse possible future states of the road agents adaptively in different environments, a prediction model should be probabilistic, multi-modal, context-driven ...
Secondly, not all probabilistic models are multi-modal, for instance, a uni-modal Gaussian. Third, to be fully context-driven, the model should leverage both the social and spatial information. ...
arXiv:2201.09874v1
fatcat:djtr2prpjva2tliyyaxvrn3vkq
Design, Implementation, and Evaluation of a Distance Learning Framework to Adapt to the Changing Landscape of Anatomy Instruction in Medical Education During COVID-19 Pandemic: A Proof-of-Concept Study
2021
Frontiers in Public Health
Using Bourdieu's Theory of Practice, we showed that the DL-framework is an efficient pedagogical approach, pertinent for medical schools to adopt; and is versatile as it attests to the key domains of students ...
In total, 70% students responded to the survey assessing perception toward DL (Kirkpatrick's Level: 1). ...
These reflect that it is pertinent that medical schools avail DL modality to address students' learning needs, indicating the need for a robust and versatile DL-framework. ...
doi:10.3389/fpubh.2021.726814
pmid:34568264
pmcid:PMC8460872
fatcat:pnklvdtxvvg7bpdto4fac72ktu
A Roadmap for Big Model
[article]
2022
arXiv
pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability ...
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. ...
To simulate the intelligence of humans, it is necessary for models to train on large-scale multi-modal data. ...
arXiv:2203.14101v4
fatcat:rdikzudoezak5b36cf6hhne5u4
Multimodal representation models for prediction and control from partial information
[article]
2019
arXiv
pre-print
Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. ...
Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities. ...
This architecture is trained with complete as well as partial data (see Table 1). Each uni-modal autoencoder can be trained separately, allowing for single modality learning. ...
arXiv:1910.03854v1
fatcat:jr4ub3lt4fcdjksynm7qto6vru
Team CERBERUS Wins the DARPA Subterranean Challenge: Technical Overview and Lessons Learned
[article]
2022
arXiv
pre-print
In response to this challenge, we developed the CERBERUS system which exploits the synergy of legged and flying robots, coupled with robust control especially for overcoming perilous terrain, multi-modal ...
and multi-robot perception for localization and mapping in conditions of sensor degradation, and resilient autonomy through unified exploration path planning and local motion planning that reflects robot-specific ...
We extend our gratitude to all the SubT Community and the DARPA team for the exciting challenge and collaborative community that was built. ...
arXiv:2207.04914v1
fatcat:lglpthomubdizfrf4n7mmw5jla
A Metaverse: taxonomy, components, applications, and open challenges
2022
IEEE Access
Finally, we summarize the limitations and directions for implementing the immersive Metaverse as social influences, constraints, and open challenges. ...
The integration of enhanced social activities and neural-net methods requires a new definition of Metaverse suitable for the present, different from the previous Metaverse. ...
Other modalities tend to generate similar decoder representations and preserve more information in pre-trained text translation modules. Tang et al. ...
doi:10.1109/access.2021.3140175
fatcat:fnraeaz74vh33knfvhzrynesli
PyTorch Connectomics: A Scalable and Flexible Segmentation Framework for EM Connectomics
[article]
2021
arXiv
pre-print
of unlabeled data during training. ...
Those functionalities can be easily realized in PyTC by changing the configuration options without coding and adapted to other 2D and 3D segmentation tasks for different tissues and imaging modalities. ...
Our framework is not designed for specific imaging modality, data scale, or model architecture but versatile for different kinds of datasets and segmentation tasks, which distinguish us from previous works ...
arXiv:2112.05754v1
fatcat:jjdatdvbr5eypmxmqbvujmarc4
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
[article]
2022
arXiv
pre-print
However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. ...
We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. ...
Several other methods have explored simultaneous image and video training in the context of multi-task [6] and multi-modal [26] learning. ...
arXiv:2206.06346v2
fatcat:7owq7r3gjrbt3lznndrh3kbavq
Music in Extended Realities
2021
IEEE Access
Based on the results of the conducted review, a research agenda for the field is proposed. ...
This article both surveys and expands upon the knowledge accumulated in existing research in this area to build a foundation for future works that bring together Music and XR. ...
We briefly describe relatively common sensory modalities explored in Musical XR projects as follows:
1) VISUAL Due in no small part to the physiological prominence human perception places on visual stimuli ...
doi:10.1109/access.2021.3052931
fatcat:5wprklpagnaunedqe75f2cjafe
Multimodal Human Hand Motion Sensing and Analysis -A Review
2018
IEEE Transactions on Cognitive and Developmental Systems
Human hand motion analysis is an essential research topic in recent applications, especially for dexterous robot hand manipulation learning from human hand skills. ...
for hand motion sensing, including contact-based and non-contact-based approaches, are discussed with comparisons with their pros and cons; then, the state-of-theart analysis methods are introduced, with ...
Complex HHMs show more flexible and dexterous human in-hand operations, so it is more difficult to describe the process for multi-fingered manipulation [23] . ...
doi:10.1109/tcds.2018.2800167
fatcat:ojznwvn3gzg7rgnfq7yg3wf4jm
Decomposing NeRF for Editing via Feature Field Distillation
[article]
2022
arXiv
pre-print
Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training and enable ...
However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. ...
Acknowledgements We thank Tsukasa Takagi, Toshiki Nakanishi, Hiroharu Kato, and Masaaki Fukuda for helpful feedback. ...
arXiv:2205.15585v1
fatcat:v2gj3vho5rblzomdbv4su7octi
« Previous
Showing results 1 — 15 out of 928 results