Filters








24,100 Hits in 6.2 sec

Segmental multi-way local pooling for video recognition

Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin Cannons, A.G. Amitha Perera, Greg Mori
2013 Proceedings of the 21st ACM international conference on Multimedia - MM '13  
In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multiway feature pooling approach which leverages segment-level information.  ...  For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs.  ...  MULTI-WAY LOCAL POOLING Our multi-way local pooling (MLP) method constructs multiple descriptors instead of a single descriptor given a feature type for a video clip, and then attempts to improve discriminant  ... 
doi:10.1145/2502081.2502167 dblp:conf/mm/KimOVCPM13 fatcat:tz3uc73ttffnblkstouulzzimy

Fine-Grained Activity Recognition in Baseball Videos

AJ Piergiovanni, Michael S. Ryoo
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos.  ...  We find that learning temporal structure is valuable for fine-grained activity recognition.  ...  We experimentally compare various recognition approaches with temporal feature pooling for both segmented and continuous videos.  ... 
doi:10.1109/cvprw.2018.00226 dblp:conf/cvpr/PiergiovanniR18a fatcat:4fqsoji23rabbluin5kfwu3eim

Fine-grained Activity Recognition in Baseball Videos [article]

AJ Piergiovanni, Michael S. Ryoo
2018 arXiv   pre-print
We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos.  ...  We find that learning temporal structure is valuable for fine-grained activity recognition.  ...  We experimentally compare various recognition approaches with temporal feature pooling for both segmented and continuous videos.  ... 
arXiv:1804.03247v1 fatcat:6rxnufsosfgfzifighabrz7qym

Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation

Yang Li, Kan Li, Xinxin Wang
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs.  ...  In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers.  ...  In this way, we exploit multi-level video representations in a single network, which brings improvement in both performance and efficiency for action recognition.  ... 
doi:10.24963/ijcai.2018/112 dblp:conf/ijcai/LiLW18 fatcat:24pxm6ah4zfnpe7paqjbqajsz4

Comprehensive Video Understanding: Video summarization with content-based video recommender design [article]

Yudong Jiang, Kaixu Cui, Bo Peng, Changliang Xu
2019 arXiv   pre-print
A scalable deep neural network is proposed on predicting if one video segment is a useful segment for users by explicitly modelling both segment and video.  ...  We also extend our work by data augmentation and multi-task learning for preventing the model from early-stage overfitting.  ...  We tried two ways of training:1. Action and scene recognition task sharing the same backbone with two SoftMax loss branches for each classification tasks.  ... 
arXiv:1910.13888v1 fatcat:sb2dm3x7szf6tcvoymcrsq2ezu

Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition

Dinghao Fan, Hengjie Lu, Shugong Xu, Shan Cao
2021 IEEE Sensors Journal  
Our framework is trained to learn a representation for multi-task learning: gesture segmentation and gesture recognition.  ...  Depth modality contains the prior information for the location of the gesture. Therefore it can be used as the supervision for gesture segmentation.  ...  In the future, we will explore multi-task learning on other video understanding tasks (e.g., temporal localization) with more modalities such as skeleton. gesture recognition," IEEE transactions on neural  ... 
doi:10.1109/jsen.2021.3123443 fatcat:4biyoph3xbe6dksji53pzpcc6i

Guest Editorial: Ad Hoc Web Multimedia Analysis with Limited Supervision

Yahong Han, Yi Yang, Jingdong Wang
2015 Multimedia tools and applications  
In "Boosted MIML method for weakly-supervised image semantic segmentation," (10.1007/s11042-014-1967-5) authors propose a Boosted Multi-Instance Multi-Label (BMIML) learning method for image semantic segmentation  ...  In "Max-Margin Adaptive Model for Complex Video Pattern Recognition," (10.1007/s11042-014-2010-6) a max-margin adaptive (MMA) model for complex video pattern recognition was proposed, which can utilize  ... 
doi:10.1007/s11042-014-2419-y fatcat:hqct5eabprg75pocgful4sqn5e

Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos

Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Peng Li
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Weakly supervised action recognition and localization for untrimmed videos is a challenging problem with extensive applications.  ...  Given the bag-of-instances structure associated with video-level labels, action recognition is naturally formulated as a multi-instance multi-label learning problem.  ...  PreTrimNet, for effective action recognition and localization in untrimmed videos with video-level weak supervision.  ... 
doi:10.1609/aaai.v34i07.6986 fatcat:bxywbawmkfetxjy2ji5ln6k55i

Spatiotemporal Multi-Task Network for Human Activity Understanding

Yao Liu, Jianqiang Huang, Chang Zhou, Deng Cai, Xian-Sheng Hua
2017 Proceedings of the on Thematic Workshops of ACM Multimedia 2017 - Thematic Workshops '17  
To tackle these problems, we propose a spatiotemporal, multi-task, 3D deep convolutional neural network to detect (including temporally localize and recognition) actions in untrimmed videos.  ...  Then, under the fusion framework, we propose a spatiotemporal multi-task network, which has two sibling output layers for action classification and temporal localization, respectively.  ...  For video analysis, Diba [5] proposed a multi-task framework which joint action recognition and motion estimation, and achieved impressive performance for action recognition.  ... 
doi:10.1145/3126686.3126705 dblp:conf/mm/LiuHZCH17 fatcat:vauwsk6ndbd7fhxhel3swqf4fq

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, Rahul Sukthankar
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework.  ...  We achieve state-ofthe-art performance for both action proposal and localization on THUMOS'14 detection benchmark and competitive performance on ActivityNet challenge.  ...  Acknowledgement We thank João Carreira and Susanna Ricco for their help on the I3D models and optical flow.  ... 
doi:10.1109/cvpr.2018.00124 dblp:conf/cvpr/ChaoVSRDS18 fatcat:5voczqxgqrazfjbg7ensqycqdm

Temporal Segment Networks for Action Recognition in Videos [article]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
2017 arXiv   pre-print
The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively.  ...  Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident.  ...  Vector of Locally Aggregated Descriptors (VLAD) [43] , and Multi-View Super Vector (MVSV) [44] .  ... 
arXiv:1705.02953v1 fatcat:uv2haezgobfbrffgro4falihze

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation [article]

Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Ben Swift, Hanna Suominen, Hongdong Li
2020 arXiv   pre-print
To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation.  ...  Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local  ...  Given the multi-scale segment representation of a video, we employ a 3D convolution network I3D [22] to extract video features for each segment.  ... 
arXiv:2010.05468v1 fatcat:qoj67klu2va6jk4v6klg37bhwa

ActivityNet Challenge 2017 Summary [article]

Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch
2017 arXiv   pre-print
The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.  ...  We would like to thank the authors of the Kinetics dataset for their kind support; and Joao Carreira and Brian Zhang for helpful discussions.  ...  For a given video sequence we perform global average pooling on short segments of consecutive frames.  ... 
arXiv:1710.08011v1 fatcat:bc5qhp2cungrdj4j3lebxeoane

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors [chapter]

Johanna Carvajal, Chris McCool, Brian Lovell, Conrad Sanderson
2016 Lecture Notes in Computer Science  
We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation.  ...  A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows.  ...  To learn the parameters of the multi-class SVM, we used video segments containing single actions. For s-KTH this process is straightforward as the videos have been previously segmented.  ... 
doi:10.1007/978-3-319-42996-0_10 fatcat:rp3dnqsstfbjzi2sdq5oywanyq

METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos

Da Zhang, Xiyang Dai, Yuan-Fang Wang
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
segment-level boundary annotations (start time and end time) for every instance.  ...  Existing Temporal Activity Localization (TAL) methods largely adopt strong supervision for model training which requires (1) vast amounts of untrimmed videos per each activity category and (2) accurate  ...  All the experiments are conducted for five-way one-shot localization.  ... 
doi:10.1109/cvpr42600.2020.00394 dblp:conf/cvpr/ZhangDW20 fatcat:weqsqnhlpbcihhgmvzkapifzq4
« Previous Showing results 1 — 15 out of 24,100 results