A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
In this paper, we propose an entity centric region of interest detection and visual-semantic pooling scheme for complex event detection in YouTube-like videos. Our method is based on the hypothesis that many YouTube-like videos involve people interacting with each other and objects in their vicinity. Based on this hypothesis, we first discover an Area of Interest (AoI) map in image keyframes and then use the AoI map for localized pooling of features. The AoI map is derived from image baseddoi:10.1145/2660505.2660506 dblp:conf/mm/ChakrabortyCJ14 fatcat:rjdw3zmyp5hvvpa4fygabsrwj4