A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Disentangle, align and fuse for multimodal and semi-supervised image segmentation
[article]
2020
arXiv
pre-print
We present a method that offers improved segmentation accuracy of the modality of interest (over a single input model), by learning to leverage information present in other modalities, even if few (semi-supervised ...
Taking advantage of the common information shared between modalities (an organ's anatomy) is beneficial for multi-modality processing and learning. ...
This has been made possible by disentangling images into semantic anatomy factors that are consistently represented across modalities and modality factors that model the intensity variability of the multimodal ...
arXiv:1911.04417v4
fatcat:qxlay6fzz5fdlcpta2epygydf4
Disentangled representation learning in cardiac image analysis
2019
Medical Image Analysis
Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. ...
To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. ...
Tsaftaris acknowledges the support of the Royal Academy of Engineering and the Research Chairs and Senior Research Fellowships scheme. ...
doi:10.1016/j.media.2019.101535
pmid:31351230
pmcid:PMC6815716
fatcat:amrltox6svgk7oemhjivkyzfly
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
[article]
2021
arXiv
pre-print
The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive ...
In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. ...
CRediT authorship contribution statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared ...
arXiv:2107.13782v2
fatcat:s4spofwxjndb7leqbcqnwbifq4
Disentangled Representation Learning in Cardiac Image Analysis
[article]
2019
arXiv
pre-print
Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. ...
To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. ...
Tsaftaris acknowledges the support of the Royal Academy of Engineering and the Research Chairs and Senior Research Fellowships scheme. ...
arXiv:1903.09467v4
fatcat:lsdtpg2cove5thk4r35osw2gni
Image-to-Image Translation: Methods and Applications
[article]
2021
arXiv
pre-print
I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, ...
Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields. ...
[100] disentangled the representation of two domains into three parts: the shared part containing common information of both domains, and two exclusive parts that only represent those factors of variation ...
arXiv:2101.08629v2
fatcat:i6pywjwnvnhp3i7cmgza2slnle
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
[article]
2020
arXiv
pre-print
Therefore, it is of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. ...
In this paper, we provide a technical review of available models and learning methods for multimodal intelligence. ...
This work is partially supported by Beijing Academy of Artificial Intelligence (BAAI). ...
arXiv:1911.03977v3
fatcat:ojazuw3qzvfqrdweul6qdpxuo4
Unsupervised Multi-Domain Multimodal Image-to-Image Translation with Explicit Domain-Constrained Disentanglement
[article]
2019
arXiv
pre-print
Furthermore, we also investigate how to better extract domain supervision information so as to learn better disentangled representations and achieve better image translation. ...
We also found in experiments that the implicit disentanglement of content and style could lead to unexpect results. ...
Various improvement has been proposed to handle challenges in GANs including model generalization and training stability. ...
arXiv:1911.00622v1
fatcat:dn64bf2ndjbwfa564yk6k7ttmi
Latent Structure Mining with Contrastive Modality Fusion for Multimedia Recommendation
[article]
2022
arXiv
pre-print
To be specific, we devise a novel modality-aware structure learning module, which learns item-item relationships for each modality. ...
We argue that the latent semantic item-item structures underlying these multimodal contents could be beneficial for learning better item representations and assist the recommender models to comprehensively ...
fused representations, and thus the fused multimodal representations can adaptively capture item-item relationships shared between multiple modalities in a self-supervised manner. ...
arXiv:2111.00678v2
fatcat:boqsb2twpjd45gbtol5tpkirqa
On the Limitations of Multimodal VAEs
[article]
2022
arXiv
pre-print
We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. ...
Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. ...
All of the used datasets are either public or can be generated from publicly available resources using the code that we provide in the supplementary material. ...
arXiv:2110.04121v2
fatcat:uylvpkukifglzcwz5gcunmr7bu
Sense and Learn: Self-Supervision for Omnipresent Sensors
[article]
2021
arXiv
pre-print
level of generalization on a task of interest. ...
In this work, we leverage the self-supervised learning paradigm towards realizing the vision of continual learning from unlabeled inputs. ...
ACKNOWLEDGEMENTS The authors would like to thank Félix de Chaumont Quitry, Marco Tagliasacchi and Richard F. Lyon for their valuable feedback and help with this work. ...
arXiv:2009.13233v2
fatcat:ver2i7o5zvgv3boterps4tqxcu
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion
[article]
2020
arXiv
pre-print
Finally, we share our viewpoints regarding some future directions on this field. ...
Throughout this survey, we further indicate that the critical components for this field go to collaboration, adversarial competition and fusion over multi-modal spaces. ...
[110] studied a multi-view generation method named CR-GAN, which is a two-pathway learning model leveraging labeled and unlabeled data for self-supervised learning to improve generation quality. ...
arXiv:2006.08159v1
fatcat:g4467zmutndglmy35n3eyfwxku
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction
[article]
2021
arXiv
pre-print
preserving the specific patterns of each modality (resp. task). ...
Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. ...
Mohan, and W. Burgard, “Self-supervised model [13] Y. Zhang and Q. ...
arXiv:2112.02252v1
fatcat:ul4gs5dajjc5lecol6psabn4pu
DISSECT: DISentangle SharablE ConTent for Multimodal Integration and Crosswise-mapping
[article]
2020
bioRxiv
pre-print
In this work, we motivate a formal justification for domain-specific information separation in a simple linear case and illustrate that a self-supervised approach enables domain translation between data ...
We introduce an orthogonal gate block designed to enforce orthogonality of input feature sets by explicitly removing non-sharable information specific to each domain and illustrate separability of domain-specific ...
autoencoding architecture and a separate gate layer to identify domain-specific information in a self-supervised manner. ...
doi:10.1101/2020.09.04.283234
fatcat:ox43wq2fuzfn3ponxxngjytofe
Deep Learning for Face Anti-Spoofing: A Survey
[article]
2021
arXiv
pre-print
RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors. ...
., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial ...
(No. 2020YFC2003901), and the National Natural Science Foundation of China (No. 61876178, 61872367, and 61806196) . ...
arXiv:2106.14948v1
fatcat:o2rkploxuzfs3lbievb5t6ycqm
Deep Generative Adversarial Networks for Image-to-Image Translation: A Review
2020
Symmetry
It also discusses and analyzes current state-of-the-art image-to-image translation techniques that are based on multimodal and multidomain representations. ...
Image-to-image translation with generative adversarial networks (GANs) has been intensively studied and applied to various tasks, such as multimodal image-to-image translation, super-resolution translation ...
Self-Attention GAN. SAGAN [72] has been proposed to incorporate a self-attention mechanism into a convolutional GAN framework to improve the quality of generated images. ...
doi:10.3390/sym12101705
fatcat:rqlwjjhrvbc6fhc4mxjjvkwk6i
« Previous
Showing results 1 — 15 out of 681 results