Filters








637 Hits in 5.8 sec

Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics [article]

Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Wei Liu, Yun-hui Liu
2021 arXiv   pre-print
This paper proposes a novel pretext task to address the self-supervised video representation learning problem.  ...  Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location  ...  In this paper, enlightened by the human visual system [21] , we propose a novel pretext task to learn video representation by uncovering spatio-temporal statistical summaries from unlabeled videos.  ... 
arXiv:2008.13426v2 fatcat:otdi373j75eg3b6skktquxjehq

Nonparametric discovery of activity patterns from video collections

Michael C. Hughes, Erik B. Sudderth
2012 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops  
Bayesian nonparametric statistical methods allow the number of such behaviors and the subset exhibited by each video to be learned without supervision.  ...  Video retrieval experiments show that our approach leads to quantitative improvements over conventional bag-of-feature representations.  ...  Acknowledgments The data used in this paper was obtained from kitchen.cs.cmu.edu and the data collection was funded in part by the National Science Foundation under Grant No. EEEC-0540865  ... 
doi:10.1109/cvprw.2012.6239170 dblp:conf/cvpr/HughesS12 fatcat:xg5bvpi3krelnbx7cgjztautv4

Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition

Evangelos Sariyanidi, Hatice Gunes, Andrea Cavallaro
2015 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Moreover, we provide a comprehensive analysis of facial representations by uncovering their advantages and limitations; we elaborate on the type of information they encode and discuss how they deal with  ...  We analyse the state-of-the-art solutions by decomposing their pipelines into fundamental components, namely face registration, representation, dimensionality reduction and recognition.  ...  An interesting future direction is developing novel spatio-temporal representation paradigms to extract features from video volumes.  ... 
doi:10.1109/tpami.2014.2366127 pmid:26357337 fatcat:5uv4jaqu4nhihnkwcpqimvylye

Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning

Seyedehsamaneh Shojaeilangari, Wei-Yun Yau, Karthik Nandakumar, Jun Li, Eam Khwang Teoh
2015 IEEE Transactions on Image Processing  
ACKNOWLEDGMENT This research is supported by the Agency for Science, Technology and Research (A*STAR), Singapore. The authors would like to thank the reviewers for their valuable comments.  ...  Spatio Temporal Descriptor Construction A spatio-temporal descriptor is obtained by concatenating the spatio-temporal features extracted at each local region in the video.  ...  To recognize the emotions in the presence of self-occlusion and illumination variations, we combine the idea of sparse representation with Extreme Learning Machine (ELM) to learn a powerful classifier  ... 
doi:10.1109/tip.2015.2416634 pmid:25823034 fatcat:lwhs5wv6gjgnzig7fb7ucmsrgq

Crowded Scene Analysis: A Survey

Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, Shuicheng Yan
2015 IEEE transactions on circuits and systems for video technology (Print)  
In the past few years, an increasing number of works on crowded scene analysis have been reported, covering different aspects including crowd motion pattern learning, crowd behavior and activity analysis  ...  activities in a query video by identifying local spatio-temporal motion patterns with low likelihoods.  ...  [3] introduced a statistical model for motion patterns representation based on raw optical flow. The method is based on hierarchical problemspecific learning.  ... 
doi:10.1109/tcsvt.2014.2358029 fatcat:prgoh37gjfcl7n6dp2u6tsdoda

Event and Activity Recognition in Video Surveillance for Cyber-Physical Systems [chapter]

Swarnabja Bhaumik, Prithwish Jana, Partha Pratim Mohanta
2021 Advanced Studies in Energy Efficiency and Built Environment for Developing Countries  
Consequently each video is significantly represented by a fixed number of key-frames using a graph-based approach.  ...  A consolidated representation of the respective individual prediction vectors on video and frame levels is obtained using a biased conflation technique.  ...  On the other hand, some of the approaches by Luo et al. (2019), and Li et al. (2020) attempt to exploit the spatio-temporal features through fusion of the respective softmax scores of the deep learning  ... 
doi:10.1007/978-3-030-66222-6_4 fatcat:xly3tn3mzvbafgzaserbf3xsey

Improving the Diagnosis of Psychiatric Disorders with Self-Supervised Graph State Space Models [article]

Ahmed El Gazzar, Rajat Mani Thomas, Guido Van Wingen
2022 arXiv   pre-print
Next, we train a supervised classifier on the learned discriminative representations.  ...  First, we propose a self-supervised mask prediction task on data from healthy individuals that can exploit differences between healthy controls and patients in clinical datasets.  ...  To address these challenges, we propose a two stage framework of self-supervised learning on data from healthy individuals followed by supervised learning on learned representations.  ... 
arXiv:2206.03331v1 fatcat:yip6olisbvh35bcqyjar4si5oq

Human detection in surveillance videos and its applications - a review

Manoranjan Paul, Shah M E Haque, Subrata Chakraborty
2013 EURASIP Journal on Advances in Signal Processing  
Object detection could be performed using background subtraction, optical flow and spatio-temporal filtering techniques.  ...  A comprehensive review with comparisons on available techniques for detecting human beings in surveillance videos is presented in this paper.  ...  [137] proposed a novel linear programming relaxation algorithm for predicting player identification in a video clip using weakly supervised learning with play-by-play texts, which greatly reduced the  ... 
doi:10.1186/1687-6180-2013-176 fatcat:hucglmedkrffxdvyo4fi7lthqa

Deep Learning for Spatio-Temporal Data Mining: A Survey [article]

Senzhang Wang, Jiannong Cao, Philip S. Yu
2019 arXiv   pre-print
predictive learning, representation learning, anomaly detection and classification.  ...  As the number, volume and resolution of spatio-temporal datasets increase rapidly, traditional data mining methods, especially statistics based methods for dealing with such data are becoming overwhelmed  ...  Index Terms-Deep learning, Spatio-temporal data, data mining I.  ... 
arXiv:1906.04928v2 fatcat:4zrdtgkvirfuniq3rb2gl7ohpy

Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

Shoou-I Yu, Yi Yang, Alexander Hauptmann
2013 2013 IEEE Conference on Computer Vision and Pattern Recognition  
Local learning approaches are used to uncover the manifold structure in the appearance space with spatio-temporal constraints.  ...  We propose a tracking-by-detection approach with nonnegative discretization to tackle this problem.  ...  Acknowledgements This work is supported in part by the National Science Foundation under Grants IIS-0917072.  ... 
doi:10.1109/cvpr.2013.476 dblp:conf/cvpr/YuYH13 fatcat:dlyckx62ejf7zpvkpyl3n2kel4

Exploiting textures for better action recognition in low-quality videos

Saimunur Rahman, John See, Chiung Ching Ho
2017 EURASIP Journal on Image and Video Processing  
In this paper, we address the problem of action recognition in low-quality videos from a myriad of perspectives: spatial and temporal downsampling, video compression, and the presence of motion blurring  ...  To increase the resilience of feature representation in these type of videos, we propose to use textural features to complement classical shape and motion features.  ...  Spatio-temporal textures circumvent this feature detection step by relying on statistical regularities across the spatio-temporal cube.  ... 
doi:10.1186/s13640-017-0221-2 fatcat:cao4dr7yxnciliw2swjrq6gsmi

Object Priors for Classifying and Localizing Unseen Actions [article]

Pascal Mettes, William Thong, Cees G. M. Snoek
2021 arXiv   pre-print
Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based  ...  This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples.  ...  Self-supervised video learning Recently, a number of works have proposed approaches for representation learning for unlabeled videos through self-supervision.  ... 
arXiv:2104.04715v1 fatcat:lsj5bhe4xzaxpdfx73xvkehhse

Decoding dynamic affective responses to naturalistic videos with shared neural patterns

Hang-Yee Chan, Ale Smidts, Vincent C. Schoots, Alan G. Sanfey, Maarten A.S. Boksem
2020 NeuroImage  
Within participants, neural classifiers identified valence and arousal categories of pictures, and tracked self-report valence and arousal during video watching.  ...  Our findings provide further support for the possibility of using pre-trained neural representations to decode dynamic affective responses during a naturalistic experience.  ...  By averaging across participants, the valence and arousal self-report time series of the videos were obtained.  ... 
doi:10.1016/j.neuroimage.2020.116618 pmid:32036021 fatcat:qg3yw3iz5ffinnoi4kywblfosa

Object Priors for Classifying and Localizing Unseen Actions

Pascal Mettes, William Thong, Cees G. M. Snoek
2021 International Journal of Computer Vision  
Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based  ...  AbstractThis work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples.  ...  Self-supervised approaches utilize unlabeled train videos to learn representations without semantic class labels.  ... 
doi:10.1007/s11263-021-01454-y fatcat:gwultlkdqzcafjgwxbhwli4qzm

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning [article]

Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala
2021 arXiv   pre-print
When developing computer vision models that can reason about compositional spatio-temporal events, we need benchmarks that can analyze progress and uncover shortcomings.  ...  We present Action Genome Question Answering (AGQA), a new benchmark for compositional spatio-temporal reasoning. AGQA contains 192M unbalanced question answer pairs for 9.6K videos.  ...  This work was partially supported by the CRA DREU program, the Stanford HAI Institute, and the Brown Institute.  ... 
arXiv:2103.16002v1 fatcat:vkcqfxgssvb5bjwp7zvqetbpti
« Previous Showing results 1 — 15 out of 637 results