5,319 Hits in 7.7 sec

Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval [article]

Zhenguo Yang, Zehang Lin, Peipei Kang, Jianming Lv, Qing Li, Wenyin Liu
2019 arXiv   pre-print
In this paper, we propose to learn shared semantic space with correlation alignment (S^3CA) for multimodal data representations, which aligns nonlinear correlations of multimodal data distributions in  ...  Furthermore, we project the multimodal data into a shared semantic space for cross-modal (event) retrieval, where the distances between heterogeneous data samples can be measured directly.  ...  We contribute a weakly-aligned unpaired Wiki-Flickr Event dataset as a complement of the existing paired datasets for cross-modal retrieval.  ... 
arXiv:1901.04268v3 fatcat:hipjb7ba2fg3hp5g5d3oq3kaki

Diachronic Cross-modal Embeddings

David Semedo, Joao Magalhaes
2019 Proceedings of the 27th ACM International Conference on Multimedia - MM '19  
This paper introduces a novel diachronic cross-modal embedding (DCM), where cross-modal correlations are represented in embedding space, throughout the temporal dimension, preserving semantic similarity  ...  Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time.  ...  Static cross-modal embedding models represent multimodal data in a common space. Early approaches [8, 13, 23, 39] , learn projections based on linear correlation.  ... 
doi:10.1145/3343031.3351036 dblp:conf/mm/SemedoM19a fatcat:sv6uekobmbfxteqybxt6tnv26i

Deep Learning Techniques for Future Intelligent Cross-Media Retrieval [article]

Sadaqat ur Rehman, Muhammad Waqas, Shanshan Tu, Anis Koubaa, Obaid ur Rehman, Jawad Ahmad, Muhammad Hanif, Zhu Han
2020 arXiv   pre-print
In this paper, we provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches in solving cross-media retrieval, namely: representation, alignment, and translation  ...  Then, we present some well-known cross-media datasets used for retrieval, considering the importance of these datasets in the context in of deep learning based cross-media retrieval approaches.  ...  Furthermore, in the case of joint semantic space for multimodal data, cross-media correlation learning is performed for feature extraction.  ... 
arXiv:2008.01191v1 fatcat:t63bg55w2vdqjcprzaaidrmprq

A Comprehensive Survey on Cross-modal Retrieval [article]

Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang
2016 arXiv   pre-print
To speed up the cross-modal retrieval, a number of binary representation learning methods are proposed to map different modalities of data into a common Hamming space.  ...  In this paper, we first review a number of representative methods for cross-modal retrieval and classify them into two main groups: 1) real-valued representation learning, and 2) binary representation  ...  Hence, it is helpful for cross-modal retrieval to learn a discriminative common representation space.  ... 
arXiv:1607.06215v1 fatcat:jfbmmlvzrvcmtmzezogzuxvvqu

Multimodal Machine Learning: A Survey and Taxonomy [article]

Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency
2017 arXiv   pre-print
Multimodal machine learning aims to build models that can process and relate information from multiple modalities.  ...  We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and  ...  Another neural alternative for aligning images with captions for cross-modal retrieval was proposed by Karpathy et al. [98] , [99] .  ... 
arXiv:1705.09406v2 fatcat:262fo4sihffvxecg4nwsifoddm

Cross-Modal Retrieval between Event-Dense Text and Image

Zhongwei Xie, Lin Li, Luo Zhong, Jianquan Liu, Ling Liu
2022 Proceedings of the 2022 International Conference on Multimedia Retrieval  
Finally, we integrate text embedding and image embedding with the loss optimization empowered with the event tag by iteratively regulating the joint embedding learning for cross-modal retrieval.  ...  It is known that modality alignment is crucial for retrieval performance.  ...  In this paper, we introduce the task of event-dense text-image cross-modal retrieval and study how to improve its cross-modal alignment.  ... 
doi:10.1145/3512527.3531374 fatcat:5gvfwcywwregvlgzeoia6omaka

Semi-supervised learning based semantic cross-media retrieval

Xiyuan Zheng, Wei Zhu, Zhenmei Yu, Meijia Zhang
2021 IEEE Access  
This method is an extension of the CCA method and mainly addresses crossmedia retrieval with multi-label annotations, it also learns shared subspaces through advanced semantic information for multi-label  ...  These terms interact with each other and embed more semantics in the shared subspace.  ... 
doi:10.1109/access.2021.3080976 fatcat:q3247flvm5f7nkx2tndatep7he

Cross-modal Ambiguity Learning for Multimodal Fake News Detection

Yixuan Chen, Dongsheng Li, Peng Zhang, Jie Sui, Qin Lv, Lu Tun, Li Shang
2022 Proceedings of the ACM Web Conference 2022  
CAFE consists of 1) a cross-modal alignment module to transform the heterogeneous unimodality features into a shared semantic space, 2) a cross-modal ambiguity learning module to estimate the ambiguity  ...  between different modalities, and 3) a cross-modal fusion module to capture the cross-modal correlations.  ...  ACKNOWLEDGMENTS We are grateful to Liang Hu for his valuable comments and suggestions in the early days of this work.  ... 
doi:10.1145/3485447.3511968 fatcat:p363uf7fene5bf44p62bt5zlee

New Ideas and Trends in Deep Multimodal Content Understanding: A Review [article]

Wei Chen and Weiping Wang and Li Liu and Michael S. Lew
2020 arXiv   pre-print
These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering  ...  The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text.  ...  1) Cross-modal retrieval Single-modal and cross-modal retrieval have been researched for decades [61] .  ... 
arXiv:2010.08189v1 fatcat:2l7molbcn5hf3oyhe3l52tdwra

Discriminative Supervised Subspace Learning for Cross-modal Retrieval [article]

Haoming Zhang, Xiao-Jun Wu, Tianyang Xu, Donglin Zhang
2022 arXiv   pre-print
Nowadays the measure between heterogeneous data is still an open problem for cross-modal retrieval. The core of cross-modal retrieval is how to measure the similarity between different types of data.  ...  We in this paper propose a discriminative supervised subspace learning for cross-modal retrieval(DS2L), to make full use of discriminative information and better preserve the semantically structural information  ...  To solve this problem, supervised cross-modal retrieval approaches are developed to learn a more discriminative shared space with supervised semantic label information.  ... 
arXiv:2201.11843v1 fatcat:6b7c4xgewzab3b32wreh27cddm

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

Wei Chen, Weiping Wang, Li Liu, Michael S. Lew
2020 Neurocomputing  
These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering  ...  The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text.  ...  His research interest focuses on cross-modal retrieval with deep learning methods.  ... 
doi:10.1016/j.neucom.2020.10.042 fatcat:hyjkj5enozfrvgzxy6avtbmoxu

HANet: Hierarchical Alignment Networks for Video-Text Retrieval [article]

Peng Wu, Xiangteng He, Mingqian Tang, Yiliang Lv, Jing Liu
2021 arXiv   pre-print
Video-text retrieval is an important yet challenging task in vision-language understanding, which aims to learn a joint embedding space where related video and text instances are close to each other.  ...  Different level alignments capture fine-to-coarse correlations between video and text, as well as take the advantage of the complementary information among three semantic levels.  ...  Cross-modal Concept Learning In the last few years, cross-modal concept learning is usually utilized for a new challenge in TRECVID, i.e., Ad-hoc Video Search (AVS).  ... 
arXiv:2107.12059v2 fatcat:hdiivh6kv5ag3gdkivx2u33vsy

Cross-media analysis and reasoning: advances and directions

Yu-xin Peng, Wen-wu Zhu, Yao Zhao, Chang-sheng Xu, Qing-ming Huang, Han-qing Lu, Qing-hua Zheng, Tie-jun Huang, Wen Gao
2017 Frontiers of Information Technology & Electronic Engineering  
To address these issues, we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge  ...  graph construction and learning methodologies; (4) cross-media knowledge evolution and reasoning; (5) cross-media description and generation; (6) cross-media intelligent engines; and (7) cross-media intelligent  ...  Acknowledgements The authors would like to thank Peng CUI, Shi-kui WEI, Ji-tao SANG, Shu-hui WANG, Jing LIU, and Bu-yue QIAN for their valuable discussions and assistance.  ... 
doi:10.1631/fitee.1601787 fatcat:dqnizhdlbfhpvodzkhv5nlarxq

Deep multimodal representation learning: a survey

Wenzhong Guo, Jianwen Wang, Shiping Wanga
2019 IEEE Access  
Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years.  ...  Finally, we suggest some important directions for future work.  ...  In this way, the essential cross-model correlation for cross-modal retrieval is captured.  ... 
doi:10.1109/access.2019.2916887 fatcat:ms4wcgl5rncsbiywz27uss4ysq

Cross-Modal Hashing by lp -Norm Multiple Subgraph Combination

Dongxiao Ren, Junwei Huang, Zhonghua Wang, Fang Lu
2021 IEEE Access  
With the explosion of multi-modal Web data, effective and efficient techniques are in urgent need for cross-modal data retrieval with relevant semantics.  ...  The hash functions for different modalities are learned separately by utilizing nonlinear classification models, encoding the complicated semantic relations among cross-modal data.  ...  [44] propose a deep visualsemantic hashing for cross-modal retrieval which consists of a visual-semantic fusion network to learn the joint embedding and two modality-specific networks for learning visual  ... 
doi:10.1109/access.2021.3052605 fatcat:budiej6vsvfvtki2pu2uzjudii
« Previous Showing results 1 — 15 out of 5,319 results