Filters








1,046 Hits in 2.9 sec

Self-Conditioned Probabilistic Learning of Video Rescaling [article]

Yuan Tian, Guo Lu, Xiongkuo Min, Zhaohui Che, Guangtao Zhai, Guodong Guo, Zhiyong Gao
2021 arXiv   pre-print
In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.  ...  Extensive experimental results demonstrate the superiority of our approach on video rescaling, video compression, and efficient action recognition tasks.  ...  Acknowledgement This work was supported by the National Science Foundation of China (61831015, 61527804 and U1908210).  ... 
arXiv:2107.11639v2 fatcat:k7g4ewgwhbatzlyk2z26pub2j4

Exploiting Probabilistic Depth-Aware Object Tracking with a Conditional Variational Autoencoder

Wenhui Huang, Jason Gu, Yinchen Guo
2021 IEEE Access  
ACKNOWLEDGMENTS Part of this work has been accepted by the International  ...  The network is a combination of a CVAE and a Siamese tracking network. Its objective is to learn a conditional probabilistic model over object features and states.  ...  EXPLOITING PROBABILISTIC DEPTH-AWARE OBJECT TRACKING WITH A CONDITIONAL VARIATIONAL AUTOENCODER In this paper, we pose object tracking as a segmentation task in video clips and then generate a tracking  ... 
doi:10.1109/access.2021.3092886 fatcat:c7qz334pivcdxp44t7rvqw4iri

Simultaneous Bayesian inference of motion velocity fields and probabilistic models in successive video-frames described by spatio-temporal MRFs [article]

Yuya Inagaki, Jun-ichi Inoue
2010 arXiv   pre-print
We next attempt to estimate the optimal values of hyper-parameters including the regularization term, which define our probabilistic model macroscopically, by using the Boltzmann-machine type learning  ...  To avoid this difficulty, we rescale the regularization term by introducing a scaling factor and optimizing it by means of minimization of the mean-square error.  ...  He also thanks Saha Institute of Nuclear Physics for their warm hospitality during his stay in India.  ... 
arXiv:1004.3629v1 fatcat:vt4nwzouufbg7jpe62u2wyfwfu

Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition

Yongkang Wong, Shaokang Chen, Sandra Mau, Conrad Sanderson, Brian C. Lovell
2011 CVPR 2011 WORKSHOPS  
In video based face recognition, face images are typically captured over multiple frames in uncontrolled conditions, where head pose, illumination, shadowing, motion blur and focus change over the sequence  ...  We propose an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an "ideal" face.  ...  ICT Centre of Excellence program.  ... 
doi:10.1109/cvprw.2011.5981881 dblp:conf/cvpr/WongCMSL11 fatcat:acbmx5lceja7pju4dio7bv6rga

Real-Time Adaptive Hand Motion Recognition Using a Sparse Bayesian Classifier [chapter]

Shu-Fai Wong, Roberto Cipolla
2005 Lecture Notes in Computer Science  
The classifier is designed in a way that it supports online incremental learning and it can be thus re-trained to increase its adaptability to an input captured under a new condition.  ...  In this work, recognition is done by firstly extracting a motion gradient orientation image from a raw video input and then classifying a feature vector generated from this image to one of the 10 gestures  ...  Both training and testing data were video captured under arbitrary room conditions (with various backgrounds and lighting).  ... 
doi:10.1007/11573425_17 fatcat:ik3i7tkfgfe2nicn7e3254agpa

Learning Sequential Latent Variable Models from Multimodal Time Series Data [article]

Oliver Limoyo, Trevor Ablett, Jonathan Kelly
2022 arXiv   pre-print
In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics.  ...  Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better.  ...  Probabilistic methods have also been applied to model the joint and conditional distributions of non-sequential multimodal data.  ... 
arXiv:2204.10419v1 fatcat:cwdyp233k5afpfbkccehdvyalm

Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process [article]

Ye Wang, David B. Dunson
2015 arXiv   pre-print
However, there is a clear lack of probabilistic methods that allow learning of the manifold along with the generative distribution of the observed data.  ...  missing frames in video.  ...  Learn the one dimensional coordinate x x x 0 by your favorite distance-preserving manifold learning algorithm and rescale x x x 0 into p0, 1q; 2.  ... 
arXiv:1506.03768v1 fatcat:3wroqhtphbam5ldniezqiw5luu

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference [article]

Polina Zablotskaia, Edoardo A. Dominici, Leonid Sigal, Andreas M. Lehrmann
2020 arXiv   pre-print
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning.  ...  Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video.  ...  Acknowledgments and Disclosure of Funding This work was funded, in part, by the Vector Institute for AI, Canada CIFAR AI Chair, NSERC CRC and an NSERC DG and Discovery Accelerator Grants.  ... 
arXiv:2006.14727v1 fatcat:pvk3z4jqe5fkzogaaf7o73pz6i

High-Quality Video Generation from Static Structural Annotations

Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Chen Change Loy
2020 International Journal of Computer Vision  
We employ a cycle-consistent flow-based conditioned variational autoencoder to capture the long-term motion distributions, by which the learned bi-directional flows ensure the physical reliability of the  ...  This paper proposes a novel unsupervised video generation that is conditioned on a single structural annotation map, which in contrast to prior conditioned video generation approaches, provides a good  ...  Acknowledgements This work was supported in part by the National Natural Science Foundation of China under Grant No. 61906012, and in part by Singapore MOE AcRF Tier 1 (2018-T1-002-056), NTU SUG, and NTU  ... 
doi:10.1007/s11263-020-01334-x fatcat:yedge4qmcbd2jpyz6bo3n5fbqe

Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence

Michele Zorzi, Andrea Zanella, Alberto Testolin, Michele De Filippo De Grazia, Marco Zorzi
2015 IEEE Access  
The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide  ...  strategies at the system level, thus fully unleashing the potential of the learning approach.  ...  Therefore, self-organized networks shall not just be able to autonomously adapt to changing conditions, but also to learn based on experience.  ... 
doi:10.1109/access.2015.2471178 fatcat:uyyc6x7m5fhcxfqteh3x5kgcqe

Deep Multimodal Representation Learning from Temporal Data [article]

Xitong Yang, Palghat Ramesh, Radha Chitta, Sriganesh Madhvanath, Edgar A. Bernal, Jiebo Luo
2017 arXiv   pre-print
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications.  ...  When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process.  ...  We pre-processed the video frames to extract only the region of interest containing the mouth, and rescaled each image to 60 × 60 pixels.  ... 
arXiv:1704.03152v1 fatcat:gvt7fych3nfjnb7c7lbph4o37e

Face Recognition from Video Using the Generic Shape-Illumination Manifold [chapter]

Ognjen Arandjelović, Roberto Cipolla
2006 Lecture Notes in Computer Science  
unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video.  ...  The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have  ...  This is the focus of our current work. Additionally, we would like to improve the computational efficiency of the method by representing each FMM by a strategically chosen set of sparse samples.  ... 
doi:10.1007/11744085_3 fatcat:aj2f6vppwrb3thtqi5pvxjt4pq

Deep Multimodal Representation Learning from Temporal Data

Xitong Yang, Palghat Ramesh, Radha Chitta, Sriganesh Madhvanath, Edgar A. Bernal, Jiebo Luo
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications.  ...  When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process.  ...  We pre-processed the video frames to extract only the region of interest containing the mouth, and rescaled each image to 60 × 60 pixels.  ... 
doi:10.1109/cvpr.2017.538 dblp:conf/cvpr/YangRCMBL17 fatcat:x4wfnk2a2neyvajwm7lrh34qvy

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features [article]

Andrew Owens, Alexei A. Efros
2018 arXiv   pre-print
We propose to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned.  ...  We use this learned representation for three applications: (a) sound source localization, i.e. visualizing the source of sound in a video; (b) audio-visual action recognition; and (c) on/off-screen audio  ...  conditional on the (audio-visual) video I.  ... 
arXiv:1804.03641v2 fatcat:ah3cazadn5fzzhzqeyrvhs73yy

Real-time Interpretation of Hand Motions using a Sparse Bayesian Classifier on Motion Gradient Orientation Images

S.-F. Wong, R. Cipolla
2005 Procedings of the British Machine Vision Conference 2005  
Compared with other recently proposed methods that involve the use of hand tracking, the system can work reliably in real-time without relying on accurate tracking, and give a probabilistic output that  ...  In this work, a motion gradient orientation image is extracted directly from a raw video input and transformed to a motion feature vector.  ...  Both training and testing data are video captured under arbitrary room conditions (with various backgrounds and lighting).  ... 
doi:10.5244/c.19.41 dblp:conf/bmvc/WongC05 fatcat:goizwonrvndwldgvcvao7cfrri
« Previous Showing results 1 — 15 out of 1,046 results