Combining multiple sources of knowledge in deep CNNs for action recognition

Eunbyung Park, Xufeng Han, Tamara L. Berg, Alexander C. Berg
2016 2016 IEEE Winter Conference on Applications of Computer Vision (WACV)  
Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies have demonstrated improved performance by incorporating additional handcrafted features or by fusing predictions from multiple CNNs. Usually, these combinations are implemented via feature concatenation or by averaging output prediction scores from several CNNs. In this paper, we present new approaches for combining different sources of knowledge in
more » ... learning. First, we propose feature amplification, where we use an auxiliary, hand-crafted, feature (e.g. optical flow) to perform spatially varying soft-gating on intermediate CNN feature maps. Second, we present a spatially varying multiplicative fusion method for combining multiple CNNs trained on different sources that results in robust prediction by amplifying or suppressing the feature activations based on their agreement. We test these methods in the context of action recognition where information from spatial and temporal cues is useful, obtaining results that are comparable with state-of-the-art methods and outperform methods using only CNNs and optical flow features.
doi:10.1109/wacv.2016.7477589 dblp:conf/wacv/ParkHBB16 fatcat:m7b7wusrjzalnpgfsa6deb7pl4