Filters








3,446 Hits in 4.1 sec

Explicit Performance Metric Optimization for Fusion-Based Video Retrieval [chapter]

Ilseo Kim, Sangmin Oh, Byungki Byun, A. G. Amitha Perera, Chin-Hui Lee
2012 Lecture Notes in Computer Science  
We present a learning framework for fusion-based video retrieval system, which explicitly optimizes given performance metrics.  ...  In this work, a novel scheme to directly optimize such targeted performance metrics during learning is developed and presented.  ...  Related Work With our focus on optimizing performance metrics for fusion classifiers in consumer video retrieval, there are three areas of related work. Performance Metric Optimization.  ... 
doi:10.1007/978-3-642-33885-4_40 fatcat:clqjkkhvfnhhzfzof2matngcoq

Active selection for multi-example querying by content

A.P. Natsev, J.R. Smith
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
Multi-example content-based retrieval is therefore a simple alternative for modeling low-and mid-level semantics without the need for heavy user interaction or extensive training, as in interactive feedback  ...  Multi-example content-based retrieval (MECBR) is the process of querying content by specifying multiple query examples with a single query iteration.  ...  MULTI-EXAMPLE CONTENT-BASED RETRIEVAL Multi-example content-based retrieval (MECBR) is a generalization of traditional content-based retrieval (CBR).  ... 
doi:10.1109/icme.2003.1220950 dblp:conf/icmcs/NatsevS03 fatcat:jqijwllsyvffhlqsebh6ph26qy

Multimedia event detection with multimodal feature fusion and temporal concept localization

Sangmin Oh, Scott McCloskey, Ilseo Kim, Arash Vahdat, Kevin J. Cannons, Hossein Hajimirsadeghi, Greg Mori, A. G. Amitha Perera, Megha Pandey, Jason J. Corso
2013 Machine Vision and Applications  
The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses.  ...  In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon.  ... 
doi:10.1007/s00138-013-0525-x fatcat:m5grko5ls5denhtst2btnwdmmy

Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss [article]

Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen
2021 arXiv   pre-print
Though, due to the heterogeneity of structures and contents between video and text, previous CLIP-based models are prone to overfitting in the training phase, resulting in relatively poor retrieval performance  ...  Further, with both of them, the performance is advanced to a big extend, surpassing the previous SOTA methods for around 4.6\% R@1 in MSR-VTT.  ...  So we test the metrics from embedding space produced by each expert. Referring to Table . 7, the fusion expert matches with the whole sentence and performs best.  ... 
arXiv:2109.04290v3 fatcat:3nh7fdmsyrae7fdpfedyvfgc3y

Multimodal Machine Learning: A Survey and Taxonomy [article]

Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency
2017 arXiv   pre-print
We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and  ...  This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.  ...  optimized for retrieval.  ... 
arXiv:1705.09406v2 fatcat:262fo4sihffvxecg4nwsifoddm

Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining

Guan-Long Wu, Yin-Hsi Kuo, Tzu-Hsuan Chiu, Winston H. Hsu, Lexing Xie
2013 IEEE Multimedia  
The average query time on 100K videos consumes only 0.592 seconds. Index Terms hashing, sparsity, mobile video retrieval, explicit semantic analysis !  ...  To alleviate the need for expensive annotation for hash learning, we investigate varying approaches for pseudo label mining, where explicit semantic analysis leverages Wikipedia and performs the best.  ...  Our implementation is based on Python and the distance calculation functions of L2 and hamming distances are optimized by Cython to achieve competitive performance with Native C implementation.  ... 
doi:10.1109/mmul.2013.13 fatcat:p7lstaewyvb7nanycftda7vtlq

Multimedia search reranking

Tao Mei, Yong Rui, Shipeng Li, Qi Tian
2014 ACM Computing Surveys  
We also discuss relevant issues such as data collection, evaluation metrics, and benchmarking. We conclude with several promising directions for future research.  ...  Multimedia search re-ranking, which reorders visual documents based on multimodal cues to improve initial text-only searches, has received increasing attention in recent years.  ...  Ting Yao, for their insightful discussions. The authors also would like to thank the anonymous reviewers for their valuable comments. This work was supported in part to Dr.  ... 
doi:10.1145/2536798 fatcat:6kmga3jo4fa2tp4354wukfuzja

Semantic Based Video Retrieval System: Survey

2018 Iraqi Journal of Science  
The video retrieval system is used for finding the users' desired video among a huge number of available videos on the Internet or database.  ...  In addition to its present a generic review of techniques that has been proposed to solve the semantic gap as the major scientific problem in semantic based video retrieval.  ...  [62] Content based video retrieval systems. 2014 [63] Multimodal feature extraction for semantic mining of soccer video. 2015 [64] Reducing semantic gap in video retrieval with fusion. 2016 [28  ... 
doi:10.24996/ijs.2018.59.2a.12 fatcat:6fvq6pygqzglbptl4czxpzbjbm

Automatic discovery of query-class-dependent models for multimodal search

Lyndon S. Kennedy, Apostol (Paul) Natsev, Shih-Fu Chang
2005 Proceedings of the 13th annual ACM international conference on Multimedia - MULTIMEDIA '05  
similar fusion strategies for the combination of unimodal components for multimodal search.  ...  We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval.  ...  In future work, we will explore ways to discover the best search methods and fusion strategies, through similar performance-based metrics.  ... 
doi:10.1145/1101149.1101339 dblp:conf/mm/KennedyNC05 fatcat:g6lspob5vfemfl7zcewuywqsvi

Multimodal music information processing and retrieval: survey and future challenges [article]

Federico Simonetta, Stavros Ntalampiras, Federico Avanzini
2019 arXiv   pre-print
Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus  ...  First, we categorize the related literature based on the application they address.  ...  In video, one can also detect shots, for example by analyzing the variation of the color histograms in the video frames, using the Kullbach-Leibler distance [68] or other metrics.  ... 
arXiv:1902.05347v1 fatcat:i2indkxk3vcmxajn6ajkh56wva

Tag relevance fusion for social image retrieval

Xirong Li
2014 Multimedia Systems  
This finding suggests the potential of tag relevance fusion for real-world deployment.  ...  Experiments on a large present-day benchmark set show that tag relevance fusion leads to better image retrieval.  ...  Marcel Worring for their comments and suggestions on this work.  ... 
doi:10.1007/s00530-014-0430-9 fatcat:5z36n5pbjffmjc2aw3n3bamx5u

Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

W. H. Adams, Giridharan Iyengar, Ching-Yung Lin, Milind Ramesh Naphade, Chalapathy Neti, Harriet J. Nock, John R. Smith
2003 EURASIP Journal on Advances in Signal Processing  
In this paper we present a learning-based approach to semantic indexing of multimedia content using cues derived from audio, visual and text features.  ...  We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of concepts in the lexicon.  ...  (b) Implicit vs Explicit Fusion. Notice Implicit fusion outperforms Explicit fusion Figure 8 : 8 Precision-Recall: Human-Knowledge-based Query around 0.15.  ... 
doi:10.1155/s1110865703211173 fatcat:rwkygctgzjfx3fey7e722djxgq

Per-Exemplar Fusion Learning for Video Retrieval and Recounting

Ilseo Kim, Sangmin Oh, A.G. Amitha Perera, Chin-Hui Lee
2012 2012 IEEE International Conference on Multimedia and Expo  
We propose a novel video retrieval framework based on an extension of per-exemplar learning [7] .  ...  In particular, for every exemplar, relevance of each feature type is discriminatively analyzed and the effect of less informative features is minimized during the fusion-based associations.  ...  Performance comparison for video retrieval by different approaches including base classifiers without fusion, k-NN, fusion by a single SVM, and per-exemplar fusion.  ... 
doi:10.1109/icme.2012.150 dblp:conf/icmcs/KimOPL12 fatcat:inctd6tcqnbxbk5vfx3dyda2xy

Multimodal Sparse Linear Integration for Content-Based Item Recommendation

Qiusha Zhu, Zhao Li, Haohong Wang, Yimin Yang, Mei-Ling Shyu
2013 2013 IEEE International Symposium on Multimedia  
In this paper, an effective method MSLIM is proposed to integrate multimodal information for content-based item recommendation.  ...  Most content-based recommender systems focus on analyzing the textual information of items. For items with images, these images can be treated as another information modality.  ...  A comparison between early fusion and late fusion is done in [15] , and experiments on broadcast videos for video semantic concept detection show that late fusion tends to slightly outperform early fusion  ... 
doi:10.1109/ism.2013.37 dblp:conf/ism/ZhuLWYS13 fatcat:ld5ueesmfzdzxozcxahbolqpuu

Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [article]

Sicheng Zhao, Guoli Jia, Jufeng Yang, Guiguang Ding, Kurt Keutzer
2021 arXiv   pre-print
Furthermore, we present some representative approaches on representation learning of each affective modality, feature fusion of different affective modalities, classifier optimization for MER, and domain  ...  adaptation for MER.  ...  Higher values indicate better performance for all the metrics, except M where lower values denote better performance.  ... 
arXiv:2108.10152v1 fatcat:hwnq7hoiqba3pdf6aakcxjq33i
« Previous Showing results 1 — 15 out of 3,446 results