Filters








7,172 Hits in 4.8 sec

VIREO @ TRECVID 2012: Searching with Topology, Recounting will Small Concepts, Learning with Free Examples

Wei Zhang, Chun Chet Tan, Shiai Zhu, Ting Yao, Lei Pang, Chong-Wah Ngo
2012 TREC Video Retrieval Evaluation  
Instance Search (INS): We submitted four Bag-of-Words (BoW) based runs this year to mainly test the proper way of exploiting spatial information through comparing the weak consistency checking (WGC) and  ...  -F X NO vireo dtc 2: Spatial run with DT and background context modeling. Compared with vireo dtcv, we do not use video level fusion for this run.  ...  Acknowledgment The work described in this paper was fully supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 119610 and CityU 118812).  ... 
dblp:conf/trecvid/0031TZYPN12 fatcat:5vplqjsezvhmhl7tcu3h2pwipm

Nagoya University at TRECVID 2014: the Instance Search Task

Cai-Zhi Zhu, Yinqiang Zheng, Ichiro Ide, Shin'ichi Satoh, Kazuya Takeda
2014 TREC Video Retrieval Evaluation  
This paper presents our recent progress on a video object retrieval system that participated in the Instance Search (INS) task of the TRECVID 2014.  ...  Basically the system is a further extension of our previous Bag-of-Words (BOW) framework, with emphasis on pursuing a practical spatial re-ranking method scalable to large video database this year.  ...  Introduction We address the problem of instance search (or visual object retrieval, equivalently) from videos, i.e., to rank database videos according to the probability of the existence of specific objects  ... 
dblp:conf/trecvid/ZhuZIST14 fatcat:5yxryjp52rg4vb3xemxqyvoy5m

The MediaMill TRECVID 2012 Semantic Video Search Engine

Cees G. M. Snoek, Koen E. A. van de Sande, AmirHossein Habibian, Svetlana Kordumova, Zhenyang Li, Masoud Mazloom, Silvia L. Pintea, Ran Tao, Dennis C. Koelma, Arnold W. M. Smeulders
2012 TREC Video Retrieval Evaluation  
Our event detection and recounting experiments focus on representations using concept detectors. For instance search we study the influence of spatial verification and color invariance.  ...  The MediaMill team participated in four tasks: semantic indexing, multimedia event detection, multimedia event recounting and instance search.  ...  This research is supported by the STW SEARCHER project, the BeeldCanon project, FES COMMIT, and by the Intelligence Advanced Research Projects Activity (IARPA) via of Interior National Business Center  ... 
dblp:conf/trecvid/SnoekSHKLMP0KS12 fatcat:tkgsy56yiremddcgodbajxppde

VIREO/ECNU @ TRECVID 2013: A Video Dance of Detection, Recounting and Search with Motion Relativity and Concept Learning from Wild

Chong-Wah Ngo, Feng Wang, Wei Zhang, Chun Chet Tan, Zhanhu Sun, Shiai Zhu, Ting Yao
2013 TREC Video Retrieval Evaluation  
Instance Search (INS): We submitted four runs in total, experimenting three search paradigms for particular objects retrieval: (1) an elastic spatial consistency checking method; (2) a background context  ...  modeling; and object mining augments the results by exploring frequent instances in TV series.  ...  Acknowledgment The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 118812).  ... 
dblp:conf/trecvid/NgoW0TSZY13 fatcat:jpbx3fr25jg35oiexaohdxntzy

Searching visual instances with topology checking and context modeling

Wei Zhang, Chong-Wah Ngo
2013 Proceedings of the 3rd ACM conference on International conference on multimedia retrieval - ICMR '13  
Based on the Bag-of-Words model, we propose two techniques tailored for Instance Search.  ...  Instance Search (INS) is a realistic problem initiated by TRECVID, which is to retrieve all occurrences of the querying object, location, or person from a large video collection.  ...  For general instance types including non-rigid and non-planar objects, we elastically model the spatial topology with Delaunay Triangulation based visual words matching [21] .  ... 
doi:10.1145/2461466.2461477 dblp:conf/mir/ZhangN13 fatcat:efo7g3f4lvhirdxcfmqojwgqxu

NTT at TRECVID 2015: Instance Search

Xiaomeng Wu, Taiga Yoshida, Jun Shimamura, Hidehisa Nagano, Kunio Kashino, Takahito Kawanishi, Kaoru Hiramatsu, Takayuki Kurozumi, Tetsuya Kinebuchi
2015 TREC Video Retrieval Evaluation  
Specifically, the combinations of methods in our runs are as follows, where BOVW indicates the bag-of-visual-words model: Among the runs, F A NTT 1 and F A NTT 2 achieved the highest mean average precisions  ...  This system was tuned with the topics that were used for the instance search task of TRECVID 2014 and the BBC EastEnders dataset.  ...  CONCLUSION In this report, we proposed two practical spatial verification methods called EWGR and AF for instance search from videos.  ... 
dblp:conf/trecvid/WuYSNKKHKK15 fatcat:n45pfh5nhren7berroyzhnr3a4

IRIM at TRECVID 2014: Semantic Indexing and Instance Search

Nicolas Ballas, Benjamin Labbé, Hervé Le Borgne, Philippe Gosselin, David Picard, Miriam Redi, Bernard Mérialdo, Boris Mansencal, Jenny Benois-Pineau, Stéphane Ayache, Abdelkader Hamadi, Bahjat Safadi (+12 others)
2014 TREC Video Retrieval Evaluation  
This paper describes its participation to the TRECVID 2014 semantic indexing (SIN) and instance search (INS) tasks.  ...  The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking.  ...  Acknowledgments This work has been carried out in the context of the IRIM (Indexation et Recherche d'Information Multimédia) of the GDR-ISIS research network from CNRS.  ... 
dblp:conf/trecvid/BallasLBGPRMMBA14 fatcat:dikbbpghvndffaf4ogs2hjrk24

Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video

Jia Chen, Shizhe Chen, Qin Jin, Alexander G. Hauptmann, Po-Yao Huang, Junwei Liang, Vaibhav, Xiaojun Chang, Jiang Liu, Ting-Yao Hu, Wenhe Liu, Wei Ke (+7 others)
2018 TREC Video Retrieval Evaluation  
However, there are two main limitations of the most widely used cross-entropy (CE) function as the training target, namely exposure bias and mismatched targets in training and testing.  ...  In this section of the notebook, we present our system in the TRECVID Video to Text description generation task.  ...  We generate textually attend visual vectors t v and visual attended text vectors v t then aggregate them to maximize the similarity for a given pair via computing the trace of inner products.  ... 
dblp:conf/trecvid/ChenCJH00VCLHLK18 fatcat:4hie3xjj65gwdeoe7odbddrmrq

VIREO @ TRECVID 2014: Instance Search and Semantic Indexing

Wei Zhang, Hao Zhang, Ting Yao, Yi-Jie Lu, Jingjing Chen, Chong-Wah Ngo
2014 TREC Video Retrieval Evaluation  
Our baseline system is based on the Bag-of-Words (BoW) model [2], augmented with Hamming Embeding [3], spatial verification via Delaunay Triangulation [4] and context weighting via "Stare" model [5].  ...  We submitted two runs to test these newly added features: -2B M D VIREO.14 1: Late fusion of the detection scores using visual features. -2B M D VIREO.14 2: Late fusion of the detection scores using visual  ...  Science Foundation of China under Grant 61272290, and National Hi-Tech Research and Development Program (863 Program) of China under Grant 2014AA015102.  ... 
dblp:conf/trecvid/00310YLCN14 fatcat:jljwdvtvandnzob4kb3z33yiii

Deep Learning for Video Captioning: A Review

Shaoxiang Chen, Ting Yao, Yu-Gang Jiang
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
One is to encode a video via a thorough understanding and learn visual representation.  ...  As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video.  ...  In the absence of spatial annotation, Shen et al. first adopted multiple instance learning to detect semantic concepts in video frames, and then selected spatial region sequences using submodular maximization  ... 
doi:10.24963/ijcai.2019/877 dblp:conf/ijcai/ChenYJ19 fatcat:3xxssrzqjjd5jbvtgkkp5lw7xa

The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video

Cees G. M. Snoek, Koen E. A. van de Sande, Daniel Fontijne, AmirHossein Habibian, Mihir Jain, Svetlana Kordumova, Zhenyang Li, Masoud Mazloom, Silvia L. Pintea, Ran Tao, Dennis C. Koelma, Arnold W. M. Smeulders
2013 TREC Video Retrieval Evaluation  
For all tasks the starting point is our top-performing bag-of-words system of TRECVID 2008-2012, which uses color SIFT descriptors, average and difference coded into codebooks with spatial pyramids and  ...  The MediaMill team participated in four tasks: concept detection, object localization, instance search, and event recognition.  ...  This research is supported by the STW STORY project, the Dutch national program COMMIT, and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business  ... 
dblp:conf/trecvid/SnoekSFHJKLMP0K13 fatcat:5jqwbazglzad5fxibp2o7423x4

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Idan Schwartz, Alexander G. Schwing, Tamir Hazan
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems.  ...  We evaluate the proposed approach on the recently introduced and challenging audio-visual sceneaware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by  ...  For instance, prediction of pose given audio input [60] , learning of audio-visual object models from unlabeled video for audio source separation in novel videos [20, 51] , use of video and audio data  ... 
doi:10.1109/cvpr.2019.01283 dblp:conf/cvpr/SchwartzSH19 fatcat:ghprtqitdbggtfxpejg4yfcjyq

Semantic Based Video Retrieval System: Survey

2018 Iraqi Journal of Science  
In addition to its present a generic review of techniques that has been proposed to solve the semantic gap as the major scientific problem in semantic based video retrieval.  ...  The semantic gap is formed because of the difference between the low level features that are extracted from video content and user's perceptions of these features in a real world.  ...  [84] in this paper the trajectory based bag of visual words pipeline is improved to retrieve video action by combining spatial temporal information.  ... 
doi:10.24996/ijs.2018.59.2a.12 fatcat:6fvq6pygqzglbptl4czxpzbjbm

Object Relational Graph with Teacher-Recommended Learning for Video Captioning [article]

Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha
2020 arXiv   pre-print
Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems.  ...  Taking full advantage of the information from both vision and language is critical for the video captioning task.  ...  In this work, we propose a graph-based approach, which constructs a temporal-spatial graph on all the objects in a video to enhance object-level representation. Visual Relational Reasoning.  ... 
arXiv:2002.11566v1 fatcat:r4qtu3uapfcdvixcvymi623s24

Object Relational Graph With Teacher-Recommended Learning for Video Captioning

Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems.  ...  Taking full advantage of the information from both vision and language is critical for the video captioning task.  ...  In this work, we propose a graph-based approach, which constructs a temporal-spatial graph on all the objects in a video to enhance object-level representation. Visual Relational Reasoning.  ... 
doi:10.1109/cvpr42600.2020.01329 dblp:conf/cvpr/ZhangSY0WHZ20 fatcat:vxegivprwvgdhotyspwl3zo5ne
« Previous Showing results 1 — 15 out of 7,172 results