A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Conditional Convolutional Neural Network for Modality-Aware Face Recognition
2015
2015 IEEE International Conference on Computer Vision (ICCV)
Faces in the wild are usually captured with various poses, illuminations and occlusions, and thus inherently multimodally distributed in many tasks. ...
For a given sample, the activations of convolution kernels in a certain layer are conditioned on its present intermediate representation and the activation status in the lower layers. ...
Dropout is adopted at each layer, and the dropout rate is 0.5 for multi-PIE and 0.2 for Occluded LFW respectively. ...
doi:10.1109/iccv.2015.418
dblp:conf/iccv/XiongZTJYK15
fatcat:rmmes6zvsjfj7lvqyc5z6g2p6a
Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
[article]
2019
arXiv
pre-print
In this paper, we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval. ...
In particular,cross-modal hashing has been widely and successfully used in multimedia similarity search applications. ...
TRIPLET-BASED DEEP HASHING NETWORK FOR CROSS-MODAL RETRIEVAL In this section, we introduce our triplet-based deep hashing method (TDH) for cross-modal retrieval in detail, including formulations and learning ...
arXiv:1904.02449v1
fatcat:cid2o45pybf7fouueihmgzbdfm
A Multibias-mitigated and Sentiment Knowledge Enriched Transformer for Debiasing in Multimodal Conversational Emotion Recognition
[article]
2022
arXiv
pre-print
Multimodal emotion recognition in conversations (mERC) is an active research topic in natural language processing (NLP), which aims to predict human's emotional states in communications of multiple modalities ...
) and visual representations (i.e, gender and age), followed by a Multibias-Mitigated and sentiment Knowledge Enriched bi-modal Transformer (MMKET). ...
This leaves us with a research question: Whether the current data-driven multi-modal emotion recognition in conversations approaches produce a biased error or not? ...
arXiv:2207.08104v1
fatcat:qbraee77tbektd6kskmafnx4gm
Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval
[article]
2021
arXiv
pre-print
The main challenge of audio-visual cross-modal retrieval task focuses on learning joint embeddings from a shared subspace for computing the similarity across different modalities, where generating new ...
cross-modal retrieval methods. ...
The main challenge of cross-modal retrieval is the modality gap and the key solution of cross-modal retrieval is learning joint embedding for different modalities. ...
arXiv:1908.03737v3
fatcat:qgldi32rrng27gltfbefqay4rq
Cross-Spectrum Dual-Subspace Pairing for RGB-infrared Cross-Modality Person Re-Identification
[article]
2020
arXiv
pre-print
In this paper, a novel multi-spectrum image generation method is proposed and the generated samples are utilized to help the network to find discriminative information for re-identifying the same person ...
To address this problem, we focus on extracting the shared cross-spectrum features of different modalities. ...
ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China (No. 61633019) and the Science Foundation of Chinese Aerospace Industry (JCKY2018204B053). ...
arXiv:2003.00213v1
fatcat:576wp66ru5cfrbumzajldrilgm
Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval
[article]
2018
arXiv
pre-print
A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. ...
For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. ...
(b) Overveiw of our proposed cross-modal retrieval model. Figure 1 : Comparison of classical cross-modal retrieval models to our model. ...
arXiv:1802.00985v2
fatcat:tcobdmh5qfcezmcj7oaevaz4gi
Deep Cross Modal Learning for Caricature Verification and Identification(CaVINet)
[article]
2018
arXiv
pre-print
In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. ...
This paper presents the first cross modal architecture that handles extreme distortions of caricatures using a deep learning network that learns similar representations across the modalities. ...
ACKNOWLEDGMENTS The authors gratefully acknowledge NVIDIA for the hardware grant. This research is supported by the Department of Science and Technology, India under grant YSS/2015/001206. ...
arXiv:1807.11688v1
fatcat:ngz4yo2t7nbblpm227micvio4e
A Natural and Immersive Virtual Interface for the Surgical Safety Checklist Training
2014
Proceedings of the 2014 ACM International Workshop on Serious Games - SeriousGames '14
With the focus on natural language and entity understanding, for instance, we have improved Bing's ability to understand the user intent beyond queries and keywords. ...
Specifically, I will talk about how we have significantly improved image search quality, and built differentiated image search user experience using NLP, entity, big data, machine learning and computer ...
Modeling Attributes from Category-Attribute Proportions
Exploiting Correlation Consensus: Towards Subspace Clustering for Multi-modal Data
Learning Multimodal Neural Network with Ranking Examples
Supervised ...
doi:10.1145/2656719.2656725
dblp:conf/mm/FerracaniPB14a
fatcat:obsb2i4iybhu3dq77hujvjtbze
Predicting Visual Features from Text for Image and Video Caption Retrieval
2018
IEEE transactions on multimedia
Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. ...
Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. ...
Hence, feature transformations are performed on both sides to learn a common latent subspace where the two modalities are better represented and a cross-modal similarity can be computed [1] , [2] . ...
doi:10.1109/tmm.2018.2832602
fatcat:ypowsjvjyvbhfhtf42l6rokhii
A Decade Survey of Content Based Image Retrieval using Deep Learning
[article]
2020
arXiv
pre-print
Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. ...
This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. ...
In 2017, Wang et al. have generated the common subspace based on adversarial learning for cross-modal retrieval [169] . ...
arXiv:2012.00641v1
fatcat:2zcho2szpzcc3cs6uou3jpcley
Deep Learning: Methods and Applications
2014
Foundations and Trends® in Signal Processing
retrieval (Section 9), object recognition and computer vision (Section 10), and multi-modal and multi-task learning (Section 11). ...
Multi-modalities: Text and image The underlying mechanism for potential effectiveness of multi-modal learning involving text and image is the common semantics associated with the text and image. ...
doi:10.1561/2000000039
fatcat:vucffxhse5gfhgvt5zphgshjy4
Towards Robust Pattern Recognition: A Review
[article]
2020
arXiv
pre-print
Actually, our brain is robust at learning concepts continually and incrementally, in complex, open and changing environments, with different contexts, modalities and tasks, by showing only a few examples ...
directions for robust pattern recognition. ...
Therefore, the multi-modal learning [9] and multi-task learning [32] are also important issues for robust pattern recognition. ...
arXiv:2006.06976v1
fatcat:mn35i7bmhngl5hxr3vukdcmmde
Multimodal embedding fusion for robust speaker role recognition in video broadcast
2015
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Existing approaches mostly consider one modality, either audio (speaker role recognition) or image (shot role recognition), firstly because of the non-synchrony between both modalities, and secondly because ...
This paper presents a multimodal fusion of audio, text and image embeddings spaces for speaker role recognition in asynchronous data. ...
In [20] , authors proposed to learn visual and linguistic features jointly for image labelling and retrieval tasks. ...
doi:10.1109/asru.2015.7404820
dblp:conf/asru/RouvierDFBB15
fatcat:2wqd5ewxizbprakccn2adcujqy
Learning Semantic Features for Classifying Very Large Image Datasets Using Convolution Neural Network
2021
SN Computer Science
In this paper, a model is generated using CNN (VGG-16) architecture which combines convolution and max pooling layers at different levels using effective regularization and transfer learning with data ...
Advancements in sensors and image acquisition devices lead to tremendous increase in creation of unlabeled database of images, and traditional image retrieval approaches are inefficient in retrieving semantic ...
Later, convolution layers are used for learning the features, along with efficient regularization using dropout and data augmentation techniques. ...
doi:10.1007/s42979-021-00589-6
fatcat:ngnrghieybgbrbttaxtg2onqzm
2021 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43
2022
IEEE Transactions on Pattern Analysis and Machine Intelligence
Departments and other items may also be covered if they have been judged to have archival value. The Author Index contains the primary entry for each item, listed under the first author's name. ...
The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination. ...
., +, TPAMI June 2021 1981-1997 MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval. ...
doi:10.1109/tpami.2021.3126216
fatcat:h6bdbf2tdngefjgj76cudpoyia
« Previous
Showing results 1 — 15 out of 220 results