681 Hits in 5.9 sec

Disentangle, align and fuse for multimodal and semi-supervised image segmentation [article]

Agisilaos Chartsias, Giorgos Papanastasiou, Chengjia Wang, Scott Semple, David E. Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris
2020 arXiv   pre-print
We present a method that offers improved segmentation accuracy of the modality of interest (over a single input model), by learning to leverage information present in other modalities, even if few (semi-supervised  ...  Taking advantage of the common information shared between modalities (an organ's anatomy) is beneficial for multi-modality processing and learning.  ...  This has been made possible by disentangling images into semantic anatomy factors that are consistently represented across modalities and modality factors that model the intensity variability of the multimodal  ... 
arXiv:1911.04417v4 fatcat:qxlay6fzz5fdlcpta2epygydf4

Disentangled representation learning in cardiac image analysis

Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Scott Semple, Michelle Williams, David E. Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris
2019 Medical Image Analysis  
Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images.  ...  To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors.  ...  Tsaftaris acknowledges the support of the Royal Academy of Engineering and the Research Chairs and Senior Research Fellowships scheme.  ... 
doi:10.1016/ pmid:31351230 pmcid:PMC6815716 fatcat:amrltox6svgk7oemhjivkyzfly

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive  ...  In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time.  ...  CRediT authorship contribution statement Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

Disentangled Representation Learning in Cardiac Image Analysis [article]

Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Michelle Williams, David Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris
2019 arXiv   pre-print
Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images.  ...  To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors.  ...  Tsaftaris acknowledges the support of the Royal Academy of Engineering and the Research Chairs and Senior Research Fellowships scheme.  ... 
arXiv:1903.09467v4 fatcat:lsdtpg2cove5thk4r35osw2gni

Image-to-Image Translation: Methods and Applications [article]

Yingxue Pang, Jianxin Lin, Tao Qin, Zhibo Chen
2021 arXiv   pre-print
I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis,  ...  Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields.  ...  [100] disentangled the representation of two domains into three parts: the shared part containing common information of both domains, and two exclusive parts that only represent those factors of variation  ... 
arXiv:2101.08629v2 fatcat:i6pywjwnvnhp3i7cmgza2slnle

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications [article]

Chao Zhang, Zichao Yang, Xiaodong He, Li Deng
2020 arXiv   pre-print
Therefore, it is of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities.  ...  In this paper, we provide a technical review of available models and learning methods for multimodal intelligence.  ...  This work is partially supported by Beijing Academy of Artificial Intelligence (BAAI).  ... 
arXiv:1911.03977v3 fatcat:ojazuw3qzvfqrdweul6qdpxuo4

Unsupervised Multi-Domain Multimodal Image-to-Image Translation with Explicit Domain-Constrained Disentanglement [article]

Weihao Xia, Yujiu Yang, Jing-Hao Xue
2019 arXiv   pre-print
Furthermore, we also investigate how to better extract domain supervision information so as to learn better disentangled representations and achieve better image translation.  ...  We also found in experiments that the implicit disentanglement of content and style could lead to unexpect results.  ...  Various improvement has been proposed to handle challenges in GANs including model generalization and training stability.  ... 
arXiv:1911.00622v1 fatcat:dn64bf2ndjbwfa564yk6k7ttmi

Latent Structure Mining with Contrastive Modality Fusion for Multimedia Recommendation [article]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Mengqi Zhang, Shu Wu, Liang Wang
2022 arXiv   pre-print
To be specific, we devise a novel modality-aware structure learning module, which learns item-item relationships for each modality.  ...  We argue that the latent semantic item-item structures underlying these multimodal contents could be beneficial for learning better item representations and assist the recommender models to comprehensively  ...  fused representations, and thus the fused multimodal representations can adaptively capture item-item relationships shared between multiple modalities in a self-supervised manner.  ... 
arXiv:2111.00678v2 fatcat:boqsb2twpjd45gbtol5tpkirqa

On the Limitations of Multimodal VAEs [article]

Imant Daunhawer, Thomas M. Sutter, Kieran Chin-Cheong, Emanuele Palumbo, Julia E. Vogt
2022 arXiv   pre-print
We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models.  ...  Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data.  ...  All of the used datasets are either public or can be generated from publicly available resources using the code that we provide in the supplementary material.  ... 
arXiv:2110.04121v2 fatcat:uylvpkukifglzcwz5gcunmr7bu

Sense and Learn: Self-Supervision for Omnipresent Sensors [article]

Aaqib Saeed, Victor Ungureanu, Beat Gfeller
2021 arXiv   pre-print
level of generalization on a task of interest.  ...  In this work, we leverage the self-supervised learning paradigm towards realizing the vision of continual learning from unlabeled inputs.  ...  ACKNOWLEDGEMENTS The authors would like to thank Félix de Chaumont Quitry, Marco Tagliasacchi and Richard F. Lyon for their valuable feedback and help with this work.  ... 
arXiv:2009.13233v2 fatcat:ver2i7o5zvgv3boterps4tqxcu

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion [article]

Yang Wang
2020 arXiv   pre-print
Finally, we share our viewpoints regarding some future directions on this field.  ...  Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition and fusion over multi-modal spaces.  ...  [110] studied a multi-view generation method named CR-GAN, which is a two-pathway learning model leveraging labeled and unlabeled data for self-supervised learning to improve generation quality.  ... 
arXiv:2006.08159v1 fatcat:g4467zmutndglmy35n3eyfwxku

Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction [article]

Yikai Wang, Wenbing Huang, Fuchun Sun, Fengxiang He, Dacheng Tao
2021 arXiv   pre-print
preserving the specific patterns of each modality (resp. task).  ...  Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training.  ...  Mohan, and W. Burgard, “Self-supervised model [13] Y. Zhang and Q.  ... 
arXiv:2112.02252v1 fatcat:ul4gs5dajjc5lecol6psabn4pu

DISSECT: DISentangle SharablE ConTent for Multimodal Integration and Crosswise-mapping [article]

Geoffrey Schau, Erik Ames Burlingame, Young Hwan Chang
2020 bioRxiv   pre-print
In this work, we motivate a formal justification for domain-specific information separation in a simple linear case and illustrate that a self-supervised approach enables domain translation between data  ...  We introduce an orthogonal gate block designed to enforce orthogonality of input feature sets by explicitly removing non-sharable information specific to each domain and illustrate separability of domain-specific  ...  autoencoding architecture and a separate gate layer to identify domain-specific information in a self-supervised manner.  ... 
doi:10.1101/2020.09.04.283234 fatcat:ox43wq2fuzfn3ponxxngjytofe

Deep Learning for Face Anti-Spoofing: A Survey [article]

Zitong Yu, Yunxiao Qin, Xiaobai Li, Chenxu Zhao, Zhen Lei, Guoying Zhao
2021 arXiv   pre-print
RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors.  ...  ., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial  ...  (No. 2020YFC2003901), and the National Natural Science Foundation of China (No. 61876178, 61872367, and 61806196) .  ... 
arXiv:2106.14948v1 fatcat:o2rkploxuzfs3lbievb5t6ycqm

Deep Generative Adversarial Networks for Image-to-Image Translation: A Review

Aziz Alotaibi
2020 Symmetry  
It also discusses and analyzes current state-of-the-art image-to-image translation techniques that are based on multimodal and multidomain representations.  ...  Image-to-image translation with generative adversarial networks (GANs) has been intensively studied and applied to various tasks, such as multimodal image-to-image translation, super-resolution translation  ...  Self-Attention GAN. SAGAN [72] has been proposed to incorporate a self-attention mechanism into a convolutional GAN framework to improve the quality of generated images.  ... 
doi:10.3390/sym12101705 fatcat:rqlwjjhrvbc6fhc4mxjjvkwk6i
« Previous Showing results 1 — 15 out of 681 results