A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Joint Video and Text Parsing for Understanding Events and Answering Queries
2014
IEEE Multimedia
We propose a multimedia analysis framework to process video and text jointly for understanding events and answering user queries. Our framework produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events) and causal information (causalities between events and fluents) in the video and text. The knowledge representation of our framework is based on a spatial-temporal-causal And-Or graph (S/T/C-AOG),
doi:10.1109/mmul.2014.29
fatcat:bunjekcxezhffkinjx2zet2afm