Filters








386 Hits in 4.1 sec

Latent Structure Mining with Contrastive Modality Fusion for Multimedia Recommendation [article]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Mengqi Zhang, Shu Wu, Liang Wang
2022 arXiv   pre-print
to fully understand content information and item relationships.To this end, we propose a latent structure MIning with ContRastive mOdality fusion method (MICRO for brevity).  ...  Previous studies focus on modeling user-item interactions with multimodal features included as side information. However, this scheme is not well-designed for multimedia recommendation.  ...  CONCLUSION In this paper, we have proposed the latent structure mining method (MICRO) for multimodal recommendation, which leverages graph structure learning to discover latent item relationships underlying  ... 
arXiv:2111.00678v2 fatcat:boqsb2twpjd45gbtol5tpkirqa

Multi-modal Deep Analysis for Multimedia

Wenwu Zhu, Xin Wang, Hongzhi Li
2019 IEEE transactions on circuits and systems for video technology (Print)  
answering, multi-modal video summarization, multi-modal visual pattern mining and multi-modal recommendation.  ...  data and knowledge fusion: multi-modal fusion of data with domain knowledge.  ...  ACKNOWLEDGMENT We thank Guohao Li, Shengze Yu and Yitian Yuan for providing relevant materials and valuable opinions. This work will never be accomplished without their useful suggestions.  ... 
doi:10.1109/tcsvt.2019.2940647 fatcat:l4tchrkgrnaeradvc4nhfan2w4

Landmark Reranking for Smart Travel Guide Systems by Combining and Analyzing Diverse Media

Junge Shen, Jialie Shen, Tao Mei, Xinbo Gao
2016 IEEE Transactions on Systems, Man & Cybernetics. Systems  
It is essential for a landmark ranking system to structure, analyze, and search the travel heterogeneous information to produce human-expected results.  ...  In this paper, a novel landmark search system is introduced based on a newly designed heterogeneous information fusion scheme and a query-dependent landmark ranking strategy.  ...  modalities, which is, however, difficult to analyze for the fusion purpose.  ... 
doi:10.1109/tsmc.2016.2523948 fatcat:ikg6w33s7jg7nnky6tf2z2jy2m

Extracting Semantics from Multimedia Content: Challenges and Solutions [chapter]

Lexing Xie, Rong Yan
2008 Signals and Communication Technology  
correspondence across modalities, learning structured (generative) models to account for natural data dependency or model hidden topics, handling rare classes, leveraging unlabeled data, scaling to large  ...  We then present challenges for each of the five components along with their existing solutions: designing multimedia lexicons and using them for concept detection, handling multiple media sources and resolving  ...  In contrast, the late fusion methods directly fuse detection outputs after multiple uni-modal classifiers are generated. Neither of the fusion methods are perfect [83] .  ... 
doi:10.1007/978-0-387-76569-3_2 fatcat:jul6fw7esfaurct6erjnvpcq6q

Affective Computing for Large-scale Heterogeneous Multimedia Data

Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, Qiang Ji
2019 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
Finally, we discuss some challenges and future directions for multimedia affective computing.  ...  We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods  ...  Most of the existing work on affective understanding of multimedia rely on one modality, even when additional modalities are available, for example in videos [55] .  ... 
doi:10.1145/3363560 fatcat:m56udtjlxrauvmj6d5z2r2zdeu

Combining Multi-modal Features for Social Media Analysis [chapter]

Spiros Nikolopoulos, Eirini Giannakidou, Ioannis Kompatsiaris, Ioannis Patras, Athena Vakali
2011 Social Media Modeling and Computing  
., early fusion), we present a bio-inspired algorithm for feature selection that weights the features based on their appropriateness to represent a resource.  ...  In this chapter we discuss methods for efficiently modeling the diverse information carried by social media.  ...  The problem of tag recommendation has been further studied in [28] , where the authors suggest an approach for recommending tags by analyzing existent tags, visual context and user context in a multimedia  ... 
doi:10.1007/978-0-85729-436-4_4 fatcat:c5rsc2fi5zcglcmgtvsa5b6fca

2021 Index IEEE Transactions on Multimedia Vol. 23

2021 IEEE transactions on multimedia  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TMM 2021 3059-3072 Attentive Cross-Modal Fusion Network for RGB-D Saliency Detection.  ...  ., +, TMM 2021 2428-2441 Latent Representation Learning Model for Multi-Band Images Fusion via Low-Rank and Sparse Embedding.  ... 
doi:10.1109/tmm.2022.3141947 fatcat:lil2nf3vd5ehbfgtslulu7y3lq

Multi-modal Graph Contrastive Learning for Micro-video Recommendation

Zixuan Yi, Xi Wang, Iadh Ounis, Craig Macdonald
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
Existing works in micro-video recommendation tend to unify the multi-modal channels, thereby treating each modality with equal importance.  ...  Indeed, such multimedia content can involve diverse modalities, often represented as visual, acoustic, and textual features to the recommender model.  ...  Challenging Negative Samples Hard negative mining has been effectively applied in multi-modal fusion scenarios where particular modalities tend to dominate the learned representations [11, 12, 21] ,  ... 
doi:10.1145/3477495.3532027 fatcat:44ybm5ouzndfldmicjvvt4tgyu

A Survey on Accuracy-oriented Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation [article]

Le Wu, Xiangnan He, Xiang Wang, Kun Zhang, Meng Wang
2021 arXiv   pre-print
; and 3) temporal/sequential recommendation, which accounts for the contextual information associated with an interaction, such as time, location, and the past interactions.  ...  In this survey paper, we conduct a systematic review on neural recommender models from the perspective of recommendation modeling with the accuracy goal, aiming to summarize this field to facilitate researchers  ...  In contrast to the content-based recommendation models, with user-video interaction records, researchers proposed an Attentive Collaborative Filtering (ACF) model for multimedia recommendation [26] .  ... 
arXiv:2104.13030v3 fatcat:7bzwaxcarrgbhe36teik2rhl6e

Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion [article]

Chuanbo Hu, Minglei Yin, Bin Liu, Xin Li, Yanfang Ye
2021 arXiv   pre-print
We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification.  ...  Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.  ...  Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16, 6 (2010), 345–379. [3] Geoffrey Barbier and Huan Liu. 2011. Data mining in social media.  ... 
arXiv:2108.08301v2 fatcat:r5omsmxaenfslcy6zdkt427ggq

Cross-Lingual Cross-Media Content Linking: Annotations and Joint Representations (Dagstuhl Seminar 15201)

Alexander G. Hauptmann, James Hodson, Juanzi Li, Nicu Sebe, Achim Rettinger, Marc Herbstritt
2015 Dagstuhl Reports  
For example, users watching TV tweeting their opinions about a show.  ...  This kind of consumption throw new challenges and require innovation in the approaches to enhance content search and recommendations.  ...  Wang, "Learning Knowledge Bases for Text and Multimedia ," ACM Multimedia 2014 Tutorial, 2014.  ... 
doi:10.4230/dagrep.5.5.43 dblp:journals/dagstuhl-reports/HauptmannHLSR15 fatcat:sjqptft2m5cufcpzslggt7la5i

A Survey on Food Computing [article]

Weiqing Min and Shuqiang Jiang and Linhu Liu and Yong Rui and Ramesh Jain
2019 arXiv   pre-print
Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food.  ...  Food is very essential for human life and it is fundamental to the human experience.  ...  from different data sources into a unified multimedia food data fusion framework.  ... 
arXiv:1808.07202v5 fatcat:qjitfexaffd3fohfb7iy3lwfyi

Latent feature learning in social media network

Zhaoquan Yuan, Jitao Sang, Yan Liu, Changsheng Xu
2013 Proceedings of the 21st ACM international conference on Multimedia - MM '13  
We show that the derived latent features well embed both the media content and their observed links, leading to improvement in social media tasks of user recommendation and social image annotation.  ...  Therefore, we propose to transfer the focus from the model development to latent feature learning, and present a general feature learning framework based on the popular deep architecture.  ...  Since social media data of different modalities carry much different statistical distributions, the latent feature structure is quite complicated.  ... 
doi:10.1145/2502081.2502284 dblp:conf/mm/YuanSLX13 fatcat:6bn5pkxfqvgzbkz2tudzfjmkxm

Multimodal Marketing Intent Analysis for Effective Targeted Advertising

Lu Zhang, Jialie Jerry Shen, Jian Zhang, Jingsong Xu, Zhibin Li, Yazhou Yao, Litao Yu
2021 IEEE transactions on multimedia  
Her research interests include machine learning and deep learning for multimodal media data analysis. She has published several papers in top journals including TMM.  ...  These methods can not be used directly on multimodal data fusion for inter-correlation mining.  ...  We propose a supervised SmiDocNADE to increase discriminative power for specific tasks and incorporate multimodal knowledge by a two-branch GCN to mine intercorrelations cross modalities.  ... 
doi:10.1109/tmm.2021.3073267 fatcat:eecdpyhryvgfxmnjuxy3n6ehqi

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

Wei Chen, Weiping Wang, Li Liu, Michael S. Lew
2020 Neurocomputing  
Finally, we include several promising directions for future research.  ...  The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text.  ...  For other methods, including retrieval-and template-based methods, we recommend the existing surveys [23] [24] [25] .  ... 
doi:10.1016/j.neucom.2020.10.042 fatcat:hyjkj5enozfrvgzxy6avtbmoxu
« Previous Showing results 1 — 15 out of 386 results