A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Self-Conditioned Probabilistic Learning of Video Rescaling
[article]
2021
arXiv
pre-print
In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously. ...
Extensive experimental results demonstrate the superiority of our approach on video rescaling, video compression, and efficient action recognition tasks. ...
Acknowledgement This work was supported by the National Science Foundation of China (61831015, 61527804 and U1908210). ...
arXiv:2107.11639v2
fatcat:k7g4ewgwhbatzlyk2z26pub2j4
Exploiting Probabilistic Depth-Aware Object Tracking with a Conditional Variational Autoencoder
2021
IEEE Access
ACKNOWLEDGMENTS
Part of this work has been accepted by the International ...
The network is a combination of a CVAE and a Siamese tracking network. Its objective is to learn a conditional probabilistic model over object features and states. ...
EXPLOITING PROBABILISTIC DEPTH-AWARE OBJECT TRACKING WITH A CONDITIONAL VARIATIONAL AUTOENCODER In this paper, we pose object tracking as a segmentation task in video clips and then generate a tracking ...
doi:10.1109/access.2021.3092886
fatcat:c7qz334pivcdxp44t7rvqw4iri
Simultaneous Bayesian inference of motion velocity fields and probabilistic models in successive video-frames described by spatio-temporal MRFs
[article]
2010
arXiv
pre-print
We next attempt to estimate the optimal values of hyper-parameters including the regularization term, which define our probabilistic model macroscopically, by using the Boltzmann-machine type learning ...
To avoid this difficulty, we rescale the regularization term by introducing a scaling factor and optimizing it by means of minimization of the mean-square error. ...
He also thanks Saha Institute of Nuclear Physics for their warm hospitality during his stay in India. ...
arXiv:1004.3629v1
fatcat:vt4nwzouufbg7jpe62u2wyfwfu
Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition
2011
CVPR 2011 WORKSHOPS
In video based face recognition, face images are typically captured over multiple frames in uncontrolled conditions, where head pose, illumination, shadowing, motion blur and focus change over the sequence ...
We propose an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an "ideal" face. ...
ICT Centre of Excellence program. ...
doi:10.1109/cvprw.2011.5981881
dblp:conf/cvpr/WongCMSL11
fatcat:acbmx5lceja7pju4dio7bv6rga
Real-Time Adaptive Hand Motion Recognition Using a Sparse Bayesian Classifier
[chapter]
2005
Lecture Notes in Computer Science
The classifier is designed in a way that it supports online incremental learning and it can be thus re-trained to increase its adaptability to an input captured under a new condition. ...
In this work, recognition is done by firstly extracting a motion gradient orientation image from a raw video input and then classifying a feature vector generated from this image to one of the 10 gestures ...
Both training and testing data were video captured under arbitrary room conditions (with various backgrounds and lighting). ...
doi:10.1007/11573425_17
fatcat:ik3i7tkfgfe2nicn7e3254agpa
Learning Sequential Latent Variable Models from Multimodal Time Series Data
[article]
2022
arXiv
pre-print
In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. ...
Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. ...
Probabilistic methods have also been applied to model the joint and conditional distributions of non-sequential multimodal data. ...
arXiv:2204.10419v1
fatcat:cwdyp233k5afpfbkccehdvyalm
Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process
[article]
2015
arXiv
pre-print
However, there is a clear lack of probabilistic methods that allow learning of the manifold along with the generative distribution of the observed data. ...
missing frames in video. ...
Learn the one dimensional coordinate x x x 0 by your favorite distance-preserving manifold learning algorithm and rescale x x x 0 into p0, 1q; 2. ...
arXiv:1506.03768v1
fatcat:3wroqhtphbam5ldniezqiw5luu
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
[article]
2020
arXiv
pre-print
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. ...
Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. ...
Acknowledgments and Disclosure of Funding This work was funded, in part, by the Vector Institute for AI, Canada CIFAR AI Chair, NSERC CRC and an NSERC DG and Discovery Accelerator Grants. ...
arXiv:2006.14727v1
fatcat:pvk3z4jqe5fkzogaaf7o73pz6i
High-Quality Video Generation from Static Structural Annotations
2020
International Journal of Computer Vision
We employ a cycle-consistent flow-based conditioned variational autoencoder to capture the long-term motion distributions, by which the learned bi-directional flows ensure the physical reliability of the ...
This paper proposes a novel unsupervised video generation that is conditioned on a single structural annotation map, which in contrast to prior conditioned video generation approaches, provides a good ...
Acknowledgements This work was supported in part by the National Natural Science Foundation of China under Grant No. 61906012, and in part by Singapore MOE AcRF Tier 1 (2018-T1-002-056), NTU SUG, and NTU ...
doi:10.1007/s11263-020-01334-x
fatcat:yedge4qmcbd2jpyz6bo3n5fbqe
Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence
2015
IEEE Access
The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide ...
strategies at the system level, thus fully unleashing the potential of the learning approach. ...
Therefore, self-organized networks shall not just be able to autonomously adapt to changing conditions, but also to learn based on experience. ...
doi:10.1109/access.2015.2471178
fatcat:uyyc6x7m5fhcxfqteh3x5kgcqe
Deep Multimodal Representation Learning from Temporal Data
[article]
2017
arXiv
pre-print
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. ...
When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. ...
We pre-processed the video frames to extract only the region of interest containing the mouth, and rescaled each image to 60 × 60 pixels. ...
arXiv:1704.03152v1
fatcat:gvt7fych3nfjnb7c7lbph4o37e
Face Recognition from Video Using the Generic Shape-Illumination Manifold
[chapter]
2006
Lecture Notes in Computer Science
unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. ...
The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have ...
This is the focus of our current work. Additionally, we would like to improve the computational efficiency of the method by representing each FMM by a strategically chosen set of sparse samples. ...
doi:10.1007/11744085_3
fatcat:aj2f6vppwrb3thtqi5pvxjt4pq
Deep Multimodal Representation Learning from Temporal Data
2017
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. ...
When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. ...
We pre-processed the video frames to extract only the region of interest containing the mouth, and rescaled each image to 60 × 60 pixels. ...
doi:10.1109/cvpr.2017.538
dblp:conf/cvpr/YangRCMBL17
fatcat:x4wfnk2a2neyvajwm7lrh34qvy
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
[article]
2018
arXiv
pre-print
We propose to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned. ...
We use this learned representation for three applications: (a) sound source localization, i.e. visualizing the source of sound in a video; (b) audio-visual action recognition; and (c) on/off-screen audio ...
conditional on the (audio-visual) video I. ...
arXiv:1804.03641v2
fatcat:ah3cazadn5fzzhzqeyrvhs73yy
Real-time Interpretation of Hand Motions using a Sparse Bayesian Classifier on Motion Gradient Orientation Images
2005
Procedings of the British Machine Vision Conference 2005
Compared with other recently proposed methods that involve the use of hand tracking, the system can work reliably in real-time without relying on accurate tracking, and give a probabilistic output that ...
In this work, a motion gradient orientation image is extracted directly from a raw video input and transformed to a motion feature vector. ...
Both training and testing data are video captured under arbitrary room conditions (with various backgrounds and lighting). ...
doi:10.5244/c.19.41
dblp:conf/bmvc/WongC05
fatcat:goizwonrvndwldgvcvao7cfrri
« Previous
Showing results 1 — 15 out of 1,046 results