Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments

Iftekhar Naim, Young C. Song, Qiguang Liu, Liang Huang, Henry Kautz, Jiebo Luo, Daniel Gildea
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
We address the problem of automatically aligning natural language sentences with corresponding video segments without any direct supervision. Most existing algorithms for integrating language with videos rely on handaligned parallel data, where each natural language sentence is manually aligned with its corresponding image or video segment. Recently, fully unsupervised alignment of text with video has been shown to be feasible using hierarchical generative models. In contrast to the previous
more » ... erative models, we propose three latent-variable discriminative models for the unsupervised alignment task. The proposed discriminative models are capable of incorporating domain knowledge, by adding diverse and overlapping features. The results show that discriminative models outperform the generative models in terms of alignment accuracy.
doi:10.3115/v1/n15-1017 dblp:conf/naacl/NaimSLHKLG15 fatcat:edmlv3hrqvekvm3fowmx7kkp5u