A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
VIREO @ TRECVID 2012: Searching with Topology, Recounting will Small Concepts, Learning with Free Examples
2012
TREC Video Retrieval Evaluation
Instance Search (INS): We submitted four Bag-of-Words (BoW) based runs this year to mainly test the proper way of exploiting spatial information through comparing the weak consistency checking (WGC) and ...
-F X NO vireo dtc 2: Spatial run with DT and background context modeling. Compared with vireo dtcv, we do not use video level fusion for this run. ...
Acknowledgment The work described in this paper was fully supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 119610 and CityU 118812). ...
dblp:conf/trecvid/0031TZYPN12
fatcat:5vplqjsezvhmhl7tcu3h2pwipm
Nagoya University at TRECVID 2014: the Instance Search Task
2014
TREC Video Retrieval Evaluation
This paper presents our recent progress on a video object retrieval system that participated in the Instance Search (INS) task of the TRECVID 2014. ...
Basically the system is a further extension of our previous Bag-of-Words (BOW) framework, with emphasis on pursuing a practical spatial re-ranking method scalable to large video database this year. ...
Introduction We address the problem of instance search (or visual object retrieval, equivalently) from videos, i.e., to rank database videos according to the probability of the existence of specific objects ...
dblp:conf/trecvid/ZhuZIST14
fatcat:5yxryjp52rg4vb3xemxqyvoy5m
The MediaMill TRECVID 2012 Semantic Video Search Engine
2012
TREC Video Retrieval Evaluation
Our event detection and recounting experiments focus on representations using concept detectors. For instance search we study the influence of spatial verification and color invariance. ...
The MediaMill team participated in four tasks: semantic indexing, multimedia event detection, multimedia event recounting and instance search. ...
This research is supported by the STW SEARCHER project, the BeeldCanon project, FES COMMIT, and by the Intelligence Advanced Research Projects Activity (IARPA) via of Interior National Business Center ...
dblp:conf/trecvid/SnoekSHKLMP0KS12
fatcat:tkgsy56yiremddcgodbajxppde
VIREO/ECNU @ TRECVID 2013: A Video Dance of Detection, Recounting and Search with Motion Relativity and Concept Learning from Wild
2013
TREC Video Retrieval Evaluation
Instance Search (INS): We submitted four runs in total, experimenting three search paradigms for particular objects retrieval: (1) an elastic spatial consistency checking method; (2) a background context ...
modeling; and object mining augments the results by exploring frequent instances in TV series. ...
Acknowledgment The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 118812). ...
dblp:conf/trecvid/NgoW0TSZY13
fatcat:jpbx3fr25jg35oiexaohdxntzy
Searching visual instances with topology checking and context modeling
2013
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval - ICMR '13
Based on the Bag-of-Words model, we propose two techniques tailored for Instance Search. ...
Instance Search (INS) is a realistic problem initiated by TRECVID, which is to retrieve all occurrences of the querying object, location, or person from a large video collection. ...
For general instance types including non-rigid and non-planar objects, we elastically model the spatial topology with Delaunay Triangulation based visual words matching [21] . ...
doi:10.1145/2461466.2461477
dblp:conf/mir/ZhangN13
fatcat:efo7g3f4lvhirdxcfmqojwgqxu
NTT at TRECVID 2015: Instance Search
2015
TREC Video Retrieval Evaluation
Specifically, the combinations of methods in our runs are as follows, where BOVW indicates the bag-of-visual-words model: Among the runs, F A NTT 1 and F A NTT 2 achieved the highest mean average precisions ...
This system was tuned with the topics that were used for the instance search task of TRECVID 2014 and the BBC EastEnders dataset. ...
CONCLUSION In this report, we proposed two practical spatial verification methods called EWGR and AF for instance search from videos. ...
dblp:conf/trecvid/WuYSNKKHKK15
fatcat:n45pfh5nhren7berroyzhnr3a4
IRIM at TRECVID 2014: Semantic Indexing and Instance Search
2014
TREC Video Retrieval Evaluation
This paper describes its participation to the TRECVID 2014 semantic indexing (SIN) and instance search (INS) tasks. ...
The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. ...
Acknowledgments This work has been carried out in the context of the IRIM (Indexation et Recherche d'Information Multimédia) of the GDR-ISIS research network from CNRS. ...
dblp:conf/trecvid/BallasLBGPRMMBA14
fatcat:dikbbpghvndffaf4ogs2hjrk24
Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video
2018
TREC Video Retrieval Evaluation
However, there are two main limitations of the most widely used cross-entropy (CE) function as the training target, namely exposure bias and mismatched targets in training and testing. ...
In this section of the notebook, we present our system in the TRECVID Video to Text description generation task. ...
We generate textually attend visual vectors t v and visual attended text vectors v t then aggregate them to maximize the similarity for a given pair via computing the trace of inner products. ...
dblp:conf/trecvid/ChenCJH00VCLHLK18
fatcat:4hie3xjj65gwdeoe7odbddrmrq
VIREO @ TRECVID 2014: Instance Search and Semantic Indexing
2014
TREC Video Retrieval Evaluation
Our baseline system is based on the Bag-of-Words (BoW) model [2], augmented with Hamming Embeding [3], spatial verification via Delaunay Triangulation [4] and context weighting via "Stare" model [5]. ...
We submitted two runs to test these newly added features: -2B M D VIREO.14 1: Late fusion of the detection scores using visual features. -2B M D VIREO.14 2: Late fusion of the detection scores using visual ...
Science Foundation of China under Grant 61272290, and National Hi-Tech Research and Development Program (863 Program) of China under Grant 2014AA015102. ...
dblp:conf/trecvid/00310YLCN14
fatcat:jljwdvtvandnzob4kb3z33yiii
Deep Learning for Video Captioning: A Review
2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
One is to encode a video via a thorough understanding and learn visual representation. ...
As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. ...
In the absence of spatial annotation, Shen et al. first adopted multiple instance learning to detect semantic concepts in video frames, and then selected spatial region sequences using submodular maximization ...
doi:10.24963/ijcai.2019/877
dblp:conf/ijcai/ChenYJ19
fatcat:3xxssrzqjjd5jbvtgkkp5lw7xa
The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video
2013
TREC Video Retrieval Evaluation
For all tasks the starting point is our top-performing bag-of-words system of TRECVID 2008-2012, which uses color SIFT descriptors, average and difference coded into codebooks with spatial pyramids and ...
The MediaMill team participated in four tasks: concept detection, object localization, instance search, and event recognition. ...
This research is supported by the STW STORY project, the Dutch national program COMMIT, and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business ...
dblp:conf/trecvid/SnoekSFHJKLMP0K13
fatcat:5jqwbazglzad5fxibp2o7423x4
A Simple Baseline for Audio-Visual Scene-Aware Dialog
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. ...
We evaluate the proposed approach on the recently introduced and challenging audio-visual sceneaware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by ...
For instance, prediction of pose given audio input [60] , learning of audio-visual object models from unlabeled video for audio source separation in novel videos [20, 51] , use of video and audio data ...
doi:10.1109/cvpr.2019.01283
dblp:conf/cvpr/SchwartzSH19
fatcat:ghprtqitdbggtfxpejg4yfcjyq
Semantic Based Video Retrieval System: Survey
2018
Iraqi Journal of Science
In addition to its present a generic review of techniques that has been proposed to solve the semantic gap as the major scientific problem in semantic based video retrieval. ...
The semantic gap is formed because of the difference between the low level features that are extracted from video content and user's perceptions of these features in a real world. ...
[84] in this paper the trajectory based bag of visual words pipeline is improved to retrieve video action by combining spatial temporal information. ...
doi:10.24996/ijs.2018.59.2a.12
fatcat:6fvq6pygqzglbptl4czxpzbjbm
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
[article]
2020
arXiv
pre-print
Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. ...
Taking full advantage of the information from both vision and language is critical for the video captioning task. ...
In this work, we propose a graph-based approach, which constructs a temporal-spatial graph on all the objects in a video to enhance object-level representation. Visual Relational Reasoning. ...
arXiv:2002.11566v1
fatcat:r4qtu3uapfcdvixcvymi623s24
Object Relational Graph With Teacher-Recommended Learning for Video Captioning
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. ...
Taking full advantage of the information from both vision and language is critical for the video captioning task. ...
In this work, we propose a graph-based approach, which constructs a temporal-spatial graph on all the objects in a video to enhance object-level representation. Visual Relational Reasoning. ...
doi:10.1109/cvpr42600.2020.01329
dblp:conf/cvpr/ZhangSY0WHZ20
fatcat:vxegivprwvgdhotyspwl3zo5ne
« Previous
Showing results 1 — 15 out of 7,172 results