220 Hits in 5.7 sec

Conditional Convolutional Neural Network for Modality-Aware Face Recognition

Chao Xiong, Xiaowei Zhao, Danhang Tang, Karlekar Jayashree, Shuicheng Yan, Tae-Kyun Kim
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
Faces in the wild are usually captured with various poses, illuminations and occlusions, and thus inherently multimodally distributed in many tasks.  ...  For a given sample, the activations of convolution kernels in a certain layer are conditioned on its present intermediate representation and the activation status in the lower layers.  ...  Dropout is adopted at each layer, and the dropout rate is 0.5 for multi-PIE and 0.2 for Occluded LFW respectively.  ... 
doi:10.1109/iccv.2015.418 dblp:conf/iccv/XiongZTJYK15 fatcat:rmmes6zvsjfj7lvqyc5z6g2p6a

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval [article]

Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, Dacheng Tao
2019 arXiv   pre-print
In this paper, we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval.  ...  In particular,cross-modal hashing has been widely and successfully used in multimedia similarity search applications.  ...  TRIPLET-BASED DEEP HASHING NETWORK FOR CROSS-MODAL RETRIEVAL In this section, we introduce our triplet-based deep hashing method (TDH) for cross-modal retrieval in detail, including formulations and learning  ... 
arXiv:1904.02449v1 fatcat:cid2o45pybf7fouueihmgzbdfm

A Multibias-mitigated and Sentiment Knowledge Enriched Transformer for Debiasing in Multimodal Conversational Emotion Recognition [article]

Jinglin Wang, Fang Ma, Yazhou Zhang, Dawei Song
2022 arXiv   pre-print
Multimodal emotion recognition in conversations (mERC) is an active research topic in natural language processing (NLP), which aims to predict human's emotional states in communications of multiple modalities  ...  ) and visual representations (i.e, gender and age), followed by a Multibias-Mitigated and sentiment Knowledge Enriched bi-modal Transformer (MMKET).  ...  This leaves us with a research question: Whether the current data-driven multi-modal emotion recognition in conversations approaches produce a biased error or not?  ... 
arXiv:2207.08104v1 fatcat:qbraee77tbektd6kskmafnx4gm

Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval [article]

Donghuo Zeng, Yi Yu, Keizo Oyama
2021 arXiv   pre-print
The main challenge of audio-visual cross-modal retrieval task focuses on learning joint embeddings from a shared subspace for computing the similarity across different modalities, where generating new  ...  cross-modal retrieval methods.  ...  The main challenge of cross-modal retrieval is the modality gap and the key solution of cross-modal retrieval is learning joint embedding for different modalities.  ... 
arXiv:1908.03737v3 fatcat:qgldi32rrng27gltfbefqay4rq

Cross-Spectrum Dual-Subspace Pairing for RGB-infrared Cross-Modality Person Re-Identification [article]

Xing Fan, Hao Luo, Chi Zhang, Wei Jiang
2020 arXiv   pre-print
In this paper, a novel multi-spectrum image generation method is proposed and the generated samples are utilized to help the network to find discriminative information for re-identifying the same person  ...  To address this problem, we focus on extracting the shared cross-spectrum features of different modalities.  ...  ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China (No. 61633019) and the Science Foundation of Chinese Aerospace Industry (JCKY2018204B053).  ... 
arXiv:2003.00213v1 fatcat:576wp66ru5cfrbumzajldrilgm

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval [article]

Jing Yu, Yuhang Lu, Zengchang Qin, Yanbing Liu, Jianlong Tan, Li Guo, Weifeng Zhang
2018 arXiv   pre-print
A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval.  ...  For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction.  ...  (b) Overveiw of our proposed cross-modal retrieval model. Figure 1 : Comparison of classical cross-modal retrieval models to our model.  ... 
arXiv:1802.00985v2 fatcat:tcobdmh5qfcezmcj7oaevaz4gi

Deep Cross Modal Learning for Caricature Verification and Identification(CaVINet) [article]

Jatin Garg, Skand Vishwanath Peri, Himanshu Tolani, Narayanan C Krishnan
2018 arXiv   pre-print
In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities.  ...  This paper presents the first cross modal architecture that handles extreme distortions of caricatures using a deep learning network that learns similar representations across the modalities.  ...  ACKNOWLEDGMENTS The authors gratefully acknowledge NVIDIA for the hardware grant. This research is supported by the Department of Science and Technology, India under grant YSS/2015/001206.  ... 
arXiv:1807.11688v1 fatcat:ngz4yo2t7nbblpm227micvio4e

A Natural and Immersive Virtual Interface for the Surgical Safety Checklist Training

Andrea Ferracani, Daniele Pezzatini, Alberto Del Bimbo
2014 Proceedings of the 2014 ACM International Workshop on Serious Games - SeriousGames '14  
With the focus on natural language and entity understanding, for instance, we have improved Bing's ability to understand the user intent beyond queries and keywords.  ...  Specifically, I will talk about how we have significantly improved image search quality, and built differentiated image search user experience using NLP, entity, big data, machine learning and computer  ...  Modeling Attributes from Category-Attribute Proportions Exploiting Correlation Consensus: Towards Subspace Clustering for Multi-modal Data Learning Multimodal Neural Network with Ranking Examples Supervised  ... 
doi:10.1145/2656719.2656725 dblp:conf/mm/FerracaniPB14a fatcat:obsb2i4iybhu3dq77hujvjtbze

Predicting Visual Features from Text for Image and Video Caption Retrieval

Jianfeng Dong, Xirong Li, Cees G.M. Snoek
2018 IEEE transactions on multimedia  
Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively.  ...  Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron.  ...  Hence, feature transformations are performed on both sides to learn a common latent subspace where the two modalities are better represented and a cross-modal similarity can be computed [1] , [2] .  ... 
doi:10.1109/tmm.2018.2832602 fatcat:ypowsjvjyvbhfhtf42l6rokhii

A Decade Survey of Content Based Image Retrieval using Deep Learning [article]

Shiv Ram Dubey
2020 arXiv   pre-print
Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval.  ...  This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval.  ...  In 2017, Wang et al. have generated the common subspace based on adversarial learning for cross-modal retrieval [169] .  ... 
arXiv:2012.00641v1 fatcat:2zcho2szpzcc3cs6uou3jpcley

Deep Learning: Methods and Applications

Li Deng
2014 Foundations and Trends® in Signal Processing  
retrieval (Section 9), object recognition and computer vision (Section 10), and multi-modal and multi-task learning (Section 11).  ...  Multi-modalities: Text and image The underlying mechanism for potential effectiveness of multi-modal learning involving text and image is the common semantics associated with the text and image.  ... 
doi:10.1561/2000000039 fatcat:vucffxhse5gfhgvt5zphgshjy4

Towards Robust Pattern Recognition: A Review [article]

Xu-Yao Zhang, Cheng-Lin Liu, Ching Y. Suen
2020 arXiv   pre-print
Actually, our brain is robust at learning concepts continually and incrementally, in complex, open and changing environments, with different contexts, modalities and tasks, by showing only a few examples  ...  directions for robust pattern recognition.  ...  Therefore, the multi-modal learning [9] and multi-task learning [32] are also important issues for robust pattern recognition.  ... 
arXiv:2006.06976v1 fatcat:mn35i7bmhngl5hxr3vukdcmmde

Multimodal embedding fusion for robust speaker role recognition in video broadcast

Michael Rouvier, Sebastien Delecraz, Benoit Favre, Meriem Bendris, Frederic Bechet
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
Existing approaches mostly consider one modality, either audio (speaker role recognition) or image (shot role recognition), firstly because of the non-synchrony between both modalities, and secondly because  ...  This paper presents a multimodal fusion of audio, text and image embeddings spaces for speaker role recognition in asynchronous data.  ...  In [20] , authors proposed to learn visual and linguistic features jointly for image labelling and retrieval tasks.  ... 
doi:10.1109/asru.2015.7404820 dblp:conf/asru/RouvierDFBB15 fatcat:2wqd5ewxizbprakccn2adcujqy

Learning Semantic Features for Classifying Very Large Image Datasets Using Convolution Neural Network

A. Shubha Rao, K. Mahantesh
2021 SN Computer Science  
In this paper, a model is generated using CNN (VGG-16) architecture which combines convolution and max pooling layers at different levels using effective regularization and transfer learning with data  ...  Advancements in sensors and image acquisition devices lead to tremendous increase in creation of unlabeled database of images, and traditional image retrieval approaches are inefficient in retrieving semantic  ...  Later, convolution layers are used for learning the features, along with efficient regularization using dropout and data augmentation techniques.  ... 
doi:10.1007/s42979-021-00589-6 fatcat:ngnrghieybgbrbttaxtg2onqzm

2021 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43

2022 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Departments and other items may also be covered if they have been judged to have archival value. The Author Index contains the primary entry for each item, listed under the first author's name.  ...  The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  ., +, TPAMI June 2021 1981-1997 MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval.  ... 
doi:10.1109/tpami.2021.3126216 fatcat:h6bdbf2tdngefjgj76cudpoyia
« Previous Showing results 1 — 15 out of 220 results