A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Explicit Performance Metric Optimization for Fusion-Based Video Retrieval
[chapter]
2012
Lecture Notes in Computer Science
We present a learning framework for fusion-based video retrieval system, which explicitly optimizes given performance metrics. ...
In this work, a novel scheme to directly optimize such targeted performance metrics during learning is developed and presented. ...
Related Work With our focus on optimizing performance metrics for fusion classifiers in consumer video retrieval, there are three areas of related work. Performance Metric Optimization. ...
doi:10.1007/978-3-642-33885-4_40
fatcat:clqjkkhvfnhhzfzof2matngcoq
Active selection for multi-example querying by content
2003
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)
Multi-example content-based retrieval is therefore a simple alternative for modeling low-and mid-level semantics without the need for heavy user interaction or extensive training, as in interactive feedback ...
Multi-example content-based retrieval (MECBR) is the process of querying content by specifying multiple query examples with a single query iteration. ...
MULTI-EXAMPLE CONTENT-BASED RETRIEVAL Multi-example content-based retrieval (MECBR) is a generalization of traditional content-based retrieval (CBR). ...
doi:10.1109/icme.2003.1220950
dblp:conf/icmcs/NatsevS03
fatcat:jqijwllsyvffhlqsebh6ph26qy
Multimedia event detection with multimodal feature fusion and temporal concept localization
2013
Machine Vision and Applications
The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. ...
In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. ...
Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright thereon. ...
doi:10.1007/s00138-013-0525-x
fatcat:m5grko5ls5denhtst2btnwdmmy
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
[article]
2021
arXiv
pre-print
Though, due to the heterogeneity of structures and contents between video and text, previous CLIP-based models are prone to overfitting in the training phase, resulting in relatively poor retrieval performance ...
Further, with both of them, the performance is advanced to a big extend, surpassing the previous SOTA methods for around 4.6\% R@1 in MSR-VTT. ...
So we test the metrics from embedding space produced by each expert. Referring to Table . 7, the fusion expert matches with the whole sentence and performs best. ...
arXiv:2109.04290v3
fatcat:3nh7fdmsyrae7fdpfedyvfgc3y
Multimodal Machine Learning: A Survey and Taxonomy
[article]
2017
arXiv
pre-print
We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and ...
This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research. ...
optimized for retrieval. ...
arXiv:1705.09406v2
fatcat:262fo4sihffvxecg4nwsifoddm
Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining
2013
IEEE Multimedia
The average query time on 100K videos consumes only 0.592 seconds. Index Terms hashing, sparsity, mobile video retrieval, explicit semantic analysis ! ...
To alleviate the need for expensive annotation for hash learning, we investigate varying approaches for pseudo label mining, where explicit semantic analysis leverages Wikipedia and performs the best. ...
Our implementation is based on Python and the distance calculation functions of L2 and hamming distances are optimized by Cython to achieve competitive performance with Native C implementation. ...
doi:10.1109/mmul.2013.13
fatcat:p7lstaewyvb7nanycftda7vtlq
Multimedia search reranking
2014
ACM Computing Surveys
We also discuss relevant issues such as data collection, evaluation metrics, and benchmarking. We conclude with several promising directions for future research. ...
Multimedia search re-ranking, which reorders visual documents based on multimodal cues to improve initial text-only searches, has received increasing attention in recent years. ...
Ting Yao, for their insightful discussions. The authors also would like to thank the anonymous reviewers for their valuable comments. This work was supported in part to Dr. ...
doi:10.1145/2536798
fatcat:6kmga3jo4fa2tp4354wukfuzja
Semantic Based Video Retrieval System: Survey
2018
Iraqi Journal of Science
The video retrieval system is used for finding the users' desired video among a huge number of available videos on the Internet or database. ...
In addition to its present a generic review of techniques that has been proposed to solve the semantic gap as the major scientific problem in semantic based video retrieval. ...
[62] Content based video retrieval systems. 2014 [63] Multimodal feature extraction for semantic mining of soccer video. 2015 [64] Reducing semantic gap in video retrieval with fusion. 2016 [28 ...
doi:10.24996/ijs.2018.59.2a.12
fatcat:6fvq6pygqzglbptl4czxpzbjbm
Automatic discovery of query-class-dependent models for multimodal search
2005
Proceedings of the 13th annual ACM international conference on Multimedia - MULTIMEDIA '05
similar fusion strategies for the combination of unimodal components for multimodal search. ...
We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval. ...
In future work, we will explore ways to discover the best search methods and fusion strategies, through similar performance-based metrics. ...
doi:10.1145/1101149.1101339
dblp:conf/mm/KennedyNC05
fatcat:g6lspob5vfemfl7zcewuywqsvi
Multimodal music information processing and retrieval: survey and future challenges
[article]
2019
arXiv
pre-print
Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus ...
First, we categorize the related literature based on the application they address. ...
In video, one can also detect shots, for example by analyzing the variation of the color histograms in the video frames, using the Kullbach-Leibler distance [68] or other metrics. ...
arXiv:1902.05347v1
fatcat:i2indkxk3vcmxajn6ajkh56wva
Tag relevance fusion for social image retrieval
2014
Multimedia Systems
This finding suggests the potential of tag relevance fusion for real-world deployment. ...
Experiments on a large present-day benchmark set show that tag relevance fusion leads to better image retrieval. ...
Marcel Worring for their comments and suggestions on this work. ...
doi:10.1007/s00530-014-0430-9
fatcat:5z36n5pbjffmjc2aw3n3bamx5u
Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues
2003
EURASIP Journal on Advances in Signal Processing
In this paper we present a learning-based approach to semantic indexing of multimedia content using cues derived from audio, visual and text features. ...
We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of concepts in the lexicon. ...
(b) Implicit vs Explicit Fusion. Notice Implicit fusion outperforms Explicit fusion
Figure 8 : 8 Precision-Recall: Human-Knowledge-based Query around 0.15. ...
doi:10.1155/s1110865703211173
fatcat:rwkygctgzjfx3fey7e722djxgq
Per-Exemplar Fusion Learning for Video Retrieval and Recounting
2012
2012 IEEE International Conference on Multimedia and Expo
We propose a novel video retrieval framework based on an extension of per-exemplar learning [7] . ...
In particular, for every exemplar, relevance of each feature type is discriminatively analyzed and the effect of less informative features is minimized during the fusion-based associations. ...
Performance comparison for video retrieval by different approaches including base classifiers without fusion, k-NN, fusion by a single SVM, and per-exemplar fusion. ...
doi:10.1109/icme.2012.150
dblp:conf/icmcs/KimOPL12
fatcat:inctd6tcqnbxbk5vfx3dyda2xy
Multimodal Sparse Linear Integration for Content-Based Item Recommendation
2013
2013 IEEE International Symposium on Multimedia
In this paper, an effective method MSLIM is proposed to integrate multimodal information for content-based item recommendation. ...
Most content-based recommender systems focus on analyzing the textual information of items. For items with images, these images can be treated as another information modality. ...
A comparison between early fusion and late fusion is done in [15] , and experiments on broadcast videos for video semantic concept detection show that late fusion tends to slightly outperform early fusion ...
doi:10.1109/ism.2013.37
dblp:conf/ism/ZhuLWYS13
fatcat:ld5ueesmfzdzxozcxahbolqpuu
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies
[article]
2021
arXiv
pre-print
Furthermore, we present some representative approaches on representation learning of each affective modality, feature fusion of different affective modalities, classifier optimization for MER, and domain ...
adaptation for MER. ...
Higher values indicate better performance for all the metrics, except M where lower values denote better performance. ...
arXiv:2108.10152v1
fatcat:hwnq7hoiqba3pdf6aakcxjq33i
« Previous
Showing results 1 — 15 out of 3,446 results