Filters








2,738 Hits in 7.1 sec

Multi-Label Learning With Fused Multimodal Bi-Relational Graph

Jiejun Xu, Vignesh Jagadeesh, B. S. Manjunath
2014 IEEE transactions on multimedia  
Experimental results with our proposed method on two standard multi-label image datasets are very promising. Index Terms-Graph-based semi-supervised learning, multi-label classification, multimodal.  ...  Such a representation allows for effective exploitation of both feature complementariness and label correlation. This contrasts with previous work where these two factors are considered in isolation.  ...  Ziyu Guan for his helpful discussions.  ... 
doi:10.1109/tmm.2013.2291218 fatcat:icqd2ejzpnf5rff5kfdkeuvv7u

Learning to name faces

Dayong Wang, Steven C.H. Hoi, Pengcheng Wu, Jianke Zhu, Ying He, Chunyan Miao
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
their weak labels for naming the query facial image.  ...  major components: (i) we enhance the weak labels of top-ranked similar images by exploiting the "label smoothness" assumption; (ii) we construct the multimodal representations of a facial image by extracting  ...  Algorithm for Learning to Name Faces In the above, we separately discuss the three key factors that affect the final annotation result of the proposed SBFA framework, including the label matrix Y , the  ... 
doi:10.1145/2484028.2484040 dblp:conf/sigir/WangHWZ0M13 fatcat:qiaiak4sivaqfmracgmnejmbza

Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization

Juan C. Caicedo, Jaafar BenAbdallah, Fabio A. González, Olfa Nasraoui
2012 Neurocomputing  
This paper presents a novel method based on non-negative matrix factorization to generate multimodal image representations that integrate visual features and text information.  ...  The proposed approach discovers a set of latent factors that correlate multimodal data in the same representation space.  ...  Two main requirements are herein considered to approximate the matrix factorization in Equation 2.  ... 
doi:10.1016/j.neucom.2011.04.037 fatcat:6joubook3jd5zljqxj34thjbna

Online Matrix Factorization for Space Embedding Multilabel Annotation [chapter]

Sebastian Otálora-Montenegro, Santiago A. Pérez-Rubiano, Fabio A. González
2013 Lecture Notes in Computer Science  
The paper presents an online matrix factorization algorithm for multilabel learning.  ...  This method addresses the multi-label annotation problem finding a joint embedding that represents both instances and labels in a common latent space.  ...  Radiológicas Usando Semántica Latente", "Diseño e implementación de un sistema de cómputo sobre recursos heterogéneos para la identificación de estructuras atmosféricas en predicción climatológica" and LACCIR "Multimodal  ... 
doi:10.1007/978-3-642-41822-8_43 fatcat:mfeovm7pnneetmmit5hj2rxlkq

Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach [article]

Chuanbo Hu, Minglei Yin, Bin Liu, Xin Li, Yanfang Ye
2021 arXiv   pre-print
We have constructed a large-scale dataset MM-IDTE with manually annotated multiple drug labels to support fine-grained detection of illicit drugs.  ...  Specifically, our model takes text and image data as the input and combines multimodal information to predict multiple labels of illicit drugs.  ...  correlations [48] , to exploiting label correlations for multi-label learning.  ... 
arXiv:2108.08920v1 fatcat:k6adlinv7baapkzbogr3vqzura

Multimodal Metric Learning for Tag-based Music Retrieval [article]

Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra
2020 arXiv   pre-print
Also, metric learning has already proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space.  ...  In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific  ...  This metric learning model with side information demonstrated its versatility in multi-label zero-shot annotation and retrieval tasks.  ... 
arXiv:2010.16030v1 fatcat:opjfd2xoc5avnhjgkmlf7e3f3u

Video Captioning with Guidance of Multimodal Latent Topics

Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos.  ...  We formulate the topic-aware caption generation as a multi-task learning problem, in which we add a parallel task, topic prediction, in addition to the caption task.  ...  So the 3-way factorization method [16, 24] is used to share parameters.  ... 
doi:10.1145/3123266.3123420 dblp:conf/mm/ChenCJH17 fatcat:st3ogxnthbczhnr7kygbgf7psu

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing and or both  ...  Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting  ...  One way to create a multimodal embedding is to have a projection of aligned multiple modalities data into a common sub-space governed by a similarity matrix.  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

Multi modal semantic indexing for image retrieval

Pulla Chandrika, C. V. Jawahar
2010 Proceedings of the ACM International Conference on Image and Video Retrieval - CIVR '10  
In this paper, we propose two techniques: Multi-modal Latent Semantic Indexing (MMLSI) and Multi-Modal Probabilistic Latent Semantic Analysis (MMpLSA).  ...  The experimental results demonstrate an improved accuracy over other single and multi-modal methods.  ...  This is a naive way of managing multimode data. The disadvantages include shadowing of one mode by another by factors that include dictionary size, distribution etc.  ... 
doi:10.1145/1816041.1816091 dblp:conf/civr/PullaJ10 fatcat:zv5chquwufazxcocghtrt5hnpu

Logically at Factify 2022: Multimodal Fact Verification [article]

Jie Gao, Hella-Franziska Hoffmann, Stylianos Oikonomou, David Kiskovski, Anil Bandhakavi
2022 arXiv   pre-print
This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022.  ...  Finally, we highlight challenges of the task and multimodal dataset for future research.  ...  Thus, supported maximum sequence length and optimum document context size are two of key factors to be considered.  ... 
arXiv:2112.09253v2 fatcat:cn4xj4dcybcgrb3clpufkxepmq

Video Captioning with Guidance of Multimodal Latent Topics [article]

Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann
2017 arXiv   pre-print
For the topic prediction task, we use the mined topics as the teacher to train a student topic prediction model, which learns to predict the latent topics from multimodal contents of videos.  ...  We formulate the topic-aware caption generation as a multi-task learning problem, in which we add a parallel task, topic prediction, in addition to the caption task.  ...  So the 3-way factorization method [16, 24] is used to share parameters.  ... 
arXiv:1708.09667v2 fatcat:pf5ybcxzhnfufmasqctsfe3xtu

Affective Computing for Large-scale Heterogeneous Multimedia Data

Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, Qiang Ji
2019 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods.  ...  We briefly describe the available datasets for evaluating AC algorithms.  ...  Multimodal fusion can be done in model-based and model-agnostic ways.  ... 
doi:10.1145/3363560 fatcat:m56udtjlxrauvmj6d5z2r2zdeu

Knowledge Extraction And Representation Learning For Music Recommendation And Classification

Sergio Oramas, Xavier Serra
2017 Zenodo  
Next, we focus on learning new data representations from multimodal content using deep learning architectures, addressing the problems of cold-start music recommendation and multi-label music genre classification  ...  To this end, we first focus on the problem of linking music-related texts with online knowledge repositories and on the automated construction of music knowledge bases.  ...  Labels factorization Let M be the binary matrix of items I and labels L where m ij = 1 if i i is annotated with label l j and m ij = 0 otherwise.  ... 
doi:10.5281/zenodo.1048497 fatcat:kdh5jhvocbh3riwln6n2f756su

Knowledge Extraction And Representation Learning For Music Recommendation And Classification

Sergio Oramas, Xavier Serra
2017 Zenodo  
Next, we focus on learning new data representations from multimodal content using deep learning architectures, addressing the problems of cold-start music recommendation and multi-label music genre classification  ...  To this end, we first focus on the problem of linking music-related texts with online knowledge repositories and on the automated construction of music knowledge bases.  ...  Labels factorization Let M be the binary matrix of items I and labels L where m ij = 1 if i i is annotated with label l j and m ij = 0 otherwise.  ... 
doi:10.5281/zenodo.1100973 fatcat:yfpmc6qxbbakjp6qzvywyoaoci

Large Scale Image Indexing Using Online Non-negative Semantic Embedding [chapter]

Jorge A. Vanegas, Fabio A. González
2013 Lecture Notes in Computer Science  
This paper presents a novel method to address the problem of indexing a large set of images taking advantage of associated multimodal content such as text or tags.  ...  The principal advantage of the proposed method is its formulation as an online learning algorithm, which can scale to deal with large image collections.  ...  [6] propose multimodal matrix factorization algorithms based on SGD to decompose a training data set, and find correspondences between visual patterns and text terms in large image collection.  ... 
doi:10.1007/978-3-642-41822-8_46 fatcat:cnsejphfuzgmtmmpnejkjwo5ya
« Previous Showing results 1 — 15 out of 2,738 results