802 Hits in 4.2 sec

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference [article]

Polina Zablotskaia, Edoardo A. Dominici, Leonid Sigal, Andreas M. Lehrmann
2020 arXiv   pre-print
We propose a novel spatio-temporal iterative inference framework that is powerful enough to jointly model complex multi-object representations and explicit temporal dependencies between latent variables  ...  This is achieved by leveraging 2D-LSTM, temporally conditioned inference and generation within the iterative amortized inference for posterior refinement.  ...  Spatio-Temporal Iterative Inference Our proposed model builds upon the concepts introduced in the previous section and enables robust learning of dynamic scenes through spatio-temporal iterative inference  ... 
arXiv:2006.14727v1 fatcat:pvk3z4jqe5fkzogaaf7o73pz6i

SOLD: Sub-optimal low-rank decomposition for efficient video segmentation

Chenglong Li, Liang Lin, Wangmeng Zuo, Shuicheng Yan, Jin Tang
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
The video can be segmented into several spatio-temporal regions by applying the Normalized-Cut (NCut) algorithm with the solved low-rank representation.  ...  This paper investigates how to perform robust and efficient unsupervised video segmentation while suppressing the effects of data noises and/or corruptions.  ...  Instead of using superpixels in previous works like [13, 16] , we take supervoxels as graph nodes to infer their optimal affinities because they can preserve local spatio-temporal coherence as well as  ... 
doi:10.1109/cvpr.2015.7299191 dblp:conf/cvpr/LiLZYT15 fatcat:vifbciu4kvhfjjrke726xvfeki

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos [article]

Ajay Kumar Tanwani, Pierre Sermanet, Andy Yan, Raghav Anand, Mariano Phielipp, Ken Goldberg
2020 arXiv   pre-print
We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters.  ...  We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset.  ...  Sequence Learning and Inference of Action Segments We capture the spatio-temporal dependencies in the embedded observations to predict action segments with a sequence learning model.  ... 
arXiv:2006.00545v1 fatcat:l7r5yhmm5jbmtckxrhi43xacuu

Detection of Anomalous Crowd Behavior Using Spatio-Temporal Multiresolution Model and Kronecker Sum Decompositions [article]

Kristjan Greenewald, Alfred Hero
2014 arXiv   pre-print
In this work we consider the problem of detecting anomalous spatio-temporal behavior in videos.  ...  Due to the extreme lack of available training samples relative to the dimension of the distribution, we use a mean and covariance approach and consider methods of learning the spatio-temporal covariance  ...  This model lends itself to coordinate decompositions [16] , [14] , [1] . For spatio-temporal data, we consider the natural decomposition of space (pixels) vs. time (frames) as done in [1] .  ... 
arXiv:1401.3291v2 fatcat:siacgmcqtffl3bn4nnwlieez4a

SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition [article]

Rishabh Kabra, Daniel Zoran, Goker Erdogan, Loic Matthey, Antonia Creswell, Matthew Botvinick, Alexander Lerchner, Christopher P. Burgess
2021 arXiv   pre-print
We present an unsupervised variational approach to this problem.  ...  Leveraging the shared structure that exists across different scenes, our model learns to infer two sets of latent representations from RGB video input alone: a set of "object" latents, corresponding to  ...  Unsupervised video decomposition using spatio-temporal iterative inference. arXiv preprint arXiv:2006.14727, 2020. [19] S M Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari  ... 
arXiv:2106.03849v2 fatcat:vtgeigfcrbgj5jnwka2e2djgxe

Mapping Mouse Behavior with an Unsupervised Spatio-temporal Sequence Decomposition Framework [article]

Kang Huang, Yaning Han, Ke Chen, Hongli Pan, Wenling Yi, Xiaoxi Li, Siyuan Liu, Pengfei Wei, Liping Wang
2020 bioRxiv   pre-print
For rodents, this has remained a challenge due to the high-dimensionality and large temporal variability of their behavioral features.  ...  Inspired by the natural structure of animal behavior, the present study uses a parallel, multi-stage approach to decompose motion features and generate an objective metric for mapping rodent behavior into  ...  We used an unsupervised clustering algorithm to verify the spatio-temporal representation of animal behavior and identify the movement phenotypes.  ... 
doi:10.1101/2020.09.14.295808 fatcat:zvenzbxxgnbcnk63deshzqsx4q

Multi-Stream Dynamic Video Summarization [article]

Mohamed Elfeki, Liqiang Wang, Ali Borji
2021 arXiv   pre-print
With vast amounts of video content being uploaded to the Internet every minute, video summarization becomes critical for efficient browsing, searching, and indexing of visual content.  ...  We conduct extensive experiments on the compiled dataset in addition to three other standard benchmarks that show the robustness and the advantage of our approach in both supervised and unsupervised settings  ...  Multi-Video Summarization Unlike multi-view, multivideo [49, 24] focuses on spatio-temporally independent videos and thus, can be processed individually.  ... 
arXiv:1812.00108v4 fatcat:4lizrpsj5bgr5nupdr7iyr2uye

Recognizing activities with cluster-trees of tracklets

Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid
2012 Procedings of the British Machine Vision Conference 2012  
We address the problem of recognizing complex activities, such as pole vaulting, which are characterized by the composition of a large and variable number of different spatio-temporal parts.  ...  We represent a video as a hierarchy of mid-level motion components. This hierarchy is a data-driven decomposition specific to each video.  ...  We model tracklets using multiple features describing both their spatio-temporal position and shape.  ... 
doi:10.5244/c.26.30 dblp:conf/bmvc/GaidonHS12 fatcat:c3tgkymblvecdpvbk77f4n6hiq

The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction [article]

George D. Montanez, Cosma Rohilla Shalizi
2016 arXiv   pre-print
These methods allow for tractable inference of spatio-temporal data, such as full-frame video.  ...  Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process.  ...  INTRODUCTION Modeling spatio-temporal data, such as high resolution video, is hard. The sheer dimensionality of the data often makes global inference methods difficult.  ... 
arXiv:1506.02686v2 fatcat:zdkivfvlsrgwhekhuznwgnjptu

A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition

Liang Lin, Keze Wang, Wangmeng Zuo, Meng Wang, Jiebo Luo, Lei Zhang
2015 International Journal of Computer Vision  
For model training, we propose a principled learning algorithm that iteratively (i) discovers the optimal latent variables (i.e. the ways of activity decomposition) for all training instances, (ii) updates  ...  To solve this problem, this work investigates a novel deep structured model, which adaptively decomposes an activity instance into temporal parts using the convolutional neural networks (CNNs).  ...  Initialization: Pre-train the spatio-temporal CNNs using the 2D videos. (14) ; 3.  ... 
doi:10.1007/s11263-015-0876-z fatcat:gcmxdmteubdbvcj3ah24mt2bp4

Interactive browsing system for anomaly video surveillance

Tien-Vu Nguyen, Dinh Phung, S. Gupta, S. Venkatesh
2013 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing  
We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner.  ...  We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.  ...  We demonstrate this browsing paradigm, with spatial and spatio-temporal queries in video data sets. The user interface of our system is displayed in Figure 2 .  ... 
doi:10.1109/issnip.2013.6529821 dblp:conf/issnip/NguyenPGV13 fatcat:a6m6smxoljevdfrag5snjuocoy

2020 Index IEEE Transactions on Image Processing Vol. 29

2020 IEEE Transactions on Image Processing  
., +, TIP 2020 2692-2701 Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation.  ...  ., +, TIP 2020 2166-2175 Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation.  ... 
doi:10.1109/tip.2020.3046056 fatcat:24m6k2elprf2nfmucbjzhvzk3m

Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration

Jingkuan Song, Lianli Gao, Mihai Marian Puscas, Feiping Nie, Fumin Shen, Nicu Sebe
2016 Proceedings of the 2016 ACM on Multimedia Conference - MM '16  
In this paper, we propose a novel framework, joint graph learning and video segmentation (JGLVS), which learns the similarity graph and video segmentation simultaneously.  ...  Video segmentation has become an important and active research area with a large diversity of proposed approaches.  ...  Iteratively repeating this process over multiple levels results in a a tree of spatio-temporal segmentations. In order to process long videos, Xu et al.  ... 
doi:10.1145/2964284.2964295 dblp:conf/mm/SongGPNSS16 fatcat:lnwf7sb3dzf2vnlukbfhiq4une

A Markov Clustering Topic Model for mining behaviour in video

Timothy Hospedales, Shaogang Gong, Tao Xiang
2009 2009 IEEE 12th International Conference on Computer Vision  
and behaviour mining in new video data online in real-time.  ...  This paper addresses the problem of fully automated mining of public space video data.  ...  Spatio-Temporal Video Mining Video Representation We wish to construct a generative model capable of automatic mining and screening irregular spatio-temporal patterns as 'salient behaviours' in video  ... 
doi:10.1109/iccv.2009.5459342 dblp:conf/iccv/HospedalesGX09 fatcat:aqswyqrgcngdtkkfmw3d7h24wy

Video (language) modeling: a baseline for generative models of natural videos [article]

MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra
2016 arXiv   pre-print
We propose a strong baseline model for unsupervised feature learning using video data.  ...  By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations  ...  ACKNOWLEDGEMENTS The authors would like to acknowledge Piotr Dollar for providing us the optical flow estimator, Manohar Paluri for his help with the data, and all the FAIR team for insightful comments  ... 
arXiv:1412.6604v5 fatcat:cowmyzhjmvgm7oh453rxfqt6ou
« Previous Showing results 1 — 15 out of 802 results