Filters








1,655 Hits in 4.5 sec

Cross-domain Face Presentation Attack Detection via Multi-domain Disentangled Representation Learning [article]

Guoqing Wang, Hu Han, Shiguang Shan, Xilin Chen
2020 arXiv   pre-print
Our approach consists of disentangled representation learning (DR-Net) and multi-domain learning (MD-Net).  ...  In light of this, we propose an efficient disentangled representation learning for cross-domain face PAD.  ...  We propose an effective disentangled representation learning for cross-domain presentation attack detection, which consists of disentangled representation learning (DR-Net) and multi-domain feature learning  ... 
arXiv:2004.01959v1 fatcat:dp6bogljmzainmng6cui4ppla4

Image-to-Image Translation: Methods and Applications [article]

Yingxue Pang, Jianxin Lin, Tao Qin, Zhibo Chen
2021 arXiv   pre-print
Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations.  ...  I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis,  ...  FID score in each domain and then average the scores. (4) For multi-modal multi-domain setting, we first sample each image in each domain 19 times.  ... 
arXiv:2101.08629v2 fatcat:i6pywjwnvnhp3i7cmgza2slnle

Deep Learning for Face Anti-Spoofing: A Survey [article]

Zitong Yu, Yunxiao Qin, Xiaobai Li, Chenxu Zhao, Zhen Lei, Guoying Zhao
2022 arXiv   pre-print
RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors.  ...  ., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial  ...  Acknowledgments This work was supported by the Academy of Finland for project MiGA (grant 316765), ICT 2023 project (grant 328115), Infotech Oulu, the National Key Research and Development Program of China  ... 
arXiv:2106.14948v2 fatcat:wsheo7hbwvewhjoe6ykwjuqfii

A Survey on Adversarial Image Synthesis [article]

William Roy, Glen Kelly, Robert Leer, Frederick Ricardo
2021 arXiv   pre-print
In this paper, we provide a taxonomy of methods used in image synthesis, review different models for text-to-image synthesis and image-to-image translation, and discuss some evaluation metrics as well  ...  Generative Adversarial Networks (GANs) have been extremely successful in various application domains.  ...  Disentangled representations [66] , [73] , [74] in multi-modal outputs of supervised image synthesis.  ... 
arXiv:2106.16056v2 fatcat:mivx26q4x5ampfi566tipcwv3e

Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution [article]

Somi Jeong, Jiyoung Lee, Kwanghoon Sohn
2022 arXiv   pre-print
We also exploit a contrast learning objective, which improves the disentanglement ability and effectively utilizes multi-domain image data in the training process by pairing the semantically similar images  ...  To address this problem, we propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework that leverages the decomposed content feature and appearance adaptive convolution to translate  ...  Comparison between multi-modal and multi-domain models. For translating images between three domains, (a) the multi-modal model should train six networks respectively.  ... 
arXiv:2202.02779v1 fatcat:ns45db27sfculol7cy3d7v2dci

Image-to-image translation for cross-domain disentanglement [article]

Abel Gonzalez-Garcia, Joost van de Weijer, Yoshua Bengio
2018 arXiv   pre-print
We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets  ...  In this paper, we bridge these two objectives and introduce the concept of cross-domain disentanglement. We aim to separate the internal representation into three parts.  ...  Acknowledgments We acknowledge the Spanish project TIN2016-79717-R and the CHISTERA project M2CR (PCIN2015-251).  ... 
arXiv:1805.09730v3 fatcat:vjtn6fhyovc2bktswkqys5oi6u

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive  ...  domain.  ...  Multi-domain and Multi-modality Event Dataset (MMED) [114] is released to enable domain generalization for cross-modal retrieval.  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

MC-LCR: Multi-modal contrastive classification by locally correlated representations for effective face forgery detection [article]

Gaojian Wang, Qian Jiang, Xin Jin, Wei Li, Xiaohui Cui
2022 arXiv   pre-print
To address such limitations, we propose a novel framework named Multi-modal Contrastive Classification by Locally Correlated Representations(MC-LCR), for effective face forgery detection.  ...  However, advanced manipulations only perform small-scale tampering, posing challenges to comprehensively capture subtle and local forgery artifacts, especially in high compression settings and cross-dataset  ...  Inspired by the above thoughts, we propose a novel Multi-modal Contrastive Classification by Locally Correlated Representations (MC-LCR) for face forgery detection.  ... 
arXiv:2110.03290v2 fatcat:kcpx3ndznve67f5rvzxz6djfj4

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling [article]

Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, Xavier Alameda-Pineda
2020 arXiv   pre-print
First, it can be easily extended to most multi-domain and multi-modal image-to-image translation tasks.  ...  Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains and translations.  ...  A variational loss forces the latent representation to follow this GMM, where each component is associated to a domain. This is the key to provide for both multi-modal and multi-domain translation.  ... 
arXiv:2003.06788v2 fatcat:hf25f3b23feddo5jfuvnllhjpy

Learning Disentangled Representations in the Imaging Domain [article]

Xiao Liu, Pedro Sanchez, Spyridon Thermos, Alison Q. O'Neil, Sotirios A. Tsaftaris
2022 arXiv   pre-print
In this tutorial paper, we motivate the need for disentangled representations, revisit key concepts, and describe practical building blocks and criteria for learning such representations.  ...  A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task.  ...  We thank the participants of the DREAM tutorials for feedback.  ... 
arXiv:2108.12043v5 fatcat:cbpmp6pbajhjvjzovulswuj2wy

Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach [article]

Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, Bruno Lepri
2020 arXiv   pre-print
Our proposed model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description, before generating a new image from the content and the modified  ...  Manipulating visual attributes of images through human-written text is a very challenging task. On the one hand, models have to learn the manipulation without the ground truth of the desired output.  ...  Recently, GMM-UNIT [29] proposed a unified approach for multi-domain and multi-modal translation by modeling attributes through a Gaussian mixture, in which each Gaussian component represents a domain  ... 
arXiv:2008.04200v1 fatcat:liyskz7lsfhnfl3b7fsyfethwa

Mutual Information-based Disentangled Neural Networks for Classifying Unseen Categories in Different Domains: Application to Fetal Ultrasound Imaging [article]

Qingjie Meng, Jacqueline Matthew, Veronika A. Zimmer, Alberto Gomez, David F.A. Lloyd, Daniel Rueckert, Bernhard Kainz
2021 arXiv   pre-print
We extensively evaluate the proposed method on fetal ultrasound datasets for two different image classification tasks where domain features are respectively defined by shadow artifacts and image acquisition  ...  Experimental results show that the proposed method outperforms the state-of-the-art on the classification of unseen categories in a target domain with sparsely labeled training data.  ...  RELATED WORK 1) Representation disentanglement: Disentangling representations aims at interpreting underlying interacted factors within data [5, 14] and enables the manipulation of relevant representations  ... 
arXiv:2011.00739v2 fatcat:zlk6gah6oveljkmtfrnc52godi

Font Completion and Manipulation by Cycling Between Multi-Modality Representations [article]

Ye Yuan, Wuyang Chen, Zhaowen Wang, Matthew Fisher, Zhifei Zhang, Zhangyang Wang, Hailin Jin
2021 arXiv   pre-print
Specifically, we formulate a cross-modality cycled image-to-image model structure with a graph constructor between an image encoder and an image renderer.  ...  Our proposed cross-modality cycled representation learning has the potential to be applied to other domains with prior knowledge from different data modalities.  ...  To leverage multi-modal font representations, we design a cross-modality encoder-decoder to transit between different representations.  ... 
arXiv:2108.12965v1 fatcat:vtpuvufg3nadtfeixacg3l42d4

Play as You Like: Timbre-Enhanced Multi-Modal Music Style Transfer

Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, Li Su
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domaininvariant (i.e., content) information of music in an unsupervised manner is critical.  ...  Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system.  ...  The system has two main networks, cross-domain translation and within-domain reconstruction, as shown in the left Figure 1 : The proposed multi-modal music style transfer system with intrinsic consistency  ... 
doi:10.1609/aaai.v33i01.33011061 fatcat:3tx3b4ow2jbwdcn2cqvrq6yiei

Deep Generative Adversarial Networks for Image-to-Image Translation: A Review

Aziz Alotaibi
2020 Symmetry  
Such translation entails learning to map one visual representation of a given input to another representation.  ...  This article provides a comprehensive overview of image-to-image translation based on GAN algorithms and its variants.  ...  cross-domain image translation and manipulation.  ... 
doi:10.3390/sym12101705 fatcat:rqlwjjhrvbc6fhc4mxjjvkwk6i
« Previous Showing results 1 — 15 out of 1,655 results