A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
In this paper we present an overview of our participation in TRECVID 2019  . We participated in the task Ad-hoc Video Search (AVS) and the subtasks Description Generation and Matching and Ranking of Video to Text (VTT) task. First, for the AVS Task, we develop a system architecture that we call "Word2AudioVisualVec++" (W2AVV++) based on Word2VisualVec++ (W2VV++)  that in addition to using deep visual features of videos, also uses deep audio features obtained from pre-trained networks.dblp:conf/trecvid/HernandezPBBB19 fatcat:uc57auanvzagfd5rbjfue7vm7q