IMFD IMPRESEE at TRECVID 2019: Ad-Hoc Video Search and Video To Text

Rodrigo Hernández, Jesus Perez-Martin, Nicolás Bravo, Juan Manuel Barrios, Benjamin Bustos
2019 TREC Video Retrieval Evaluation  
In this paper we present an overview of our participation in TRECVID 2019 [1] . We participated in the task Ad-hoc Video Search (AVS) and the subtasks Description Generation and Matching and Ranking of Video to Text (VTT) task. First, for the AVS Task, we develop a system architecture that we call "Word2AudioVisualVec++" (W2AVV++) based on Word2VisualVec++ (W2VV++) [11] that in addition to using deep visual features of videos, also uses deep audio features obtained from pre-trained networks.
more » ... ond, for the VTT Matching and Ranking Task, we develop another deep learning model based on Word2VisualVec++, extracting temporal information of the video by using Dense Trajectories [16] and a clustering approach to encode them into a single vector representation. Third, for the VTT Description Generation Task, we develop an Encoder-Decoder model incorporating semantic states into the Encoder phase.
dblp:conf/trecvid/HernandezPBBB19 fatcat:uc57auanvzagfd5rbjfue7vm7q