CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 [article]

Yuanjun Xiong, Limin Wang, Zhe Wang, Bowen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc Van Gool, Xiaoou Tang
2016 arXiv   pre-print
This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g., ResNet and Inception V3, and introduce new aggregation schemes (top-k and attention-weighted pooling). Additionally, we incorporate the audio as a complementary channel,
more » ... ting relevant information via a CNN applied to the spectrograms. With these techniques, we derive an ensemble of deep models, which, together, attains a high classification accuracy (mAP 93.23%) on the testing set and secured the first place in the challenge.
arXiv:1608.00797v1 fatcat:ffawius2hfcs3dwklvwwxrhpue