A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Towards an Interpretable Latent Space in Structured Models for Video Prediction
[article]
2021
arXiv
pre-print
We propose an additional decoder based loss in the pixel space, imposed in a curriculum manner, to further refine the latent space predictions. ...
We argue that injecting explicit inductive bias in the model, in form of general physical laws, can help not only make the model more interpretable, but also improve the overall prediction of model. ...
Acknowledgements We thank IIT Delhi HPC facility for computational resources. ...
arXiv:2107.07713v1
fatcat:65hyerk4ovevvgtoq7qt6pdjyu
Latent Image Animator: Learning to Animate Images via Latent Space Navigation
[article]
2022
arXiv
pre-print
LIA is streamlined to animate images by linear navigation in the latent space. Specifically, motion in generated video is constructed by linear displacement of codes in the latent space. ...
Deviating from such models, we here introduce the Latent Image Animator (LIA), a self-supervised autoencoder that evades need for structure representation. ...
It was supported by the French Government, by the National Research Agency (ANR) under Grant ANR-18-CE92-0024, project RESPECT and through the 3IA Côte d'Azur Investments in the Future project managed ...
arXiv:2203.09043v1
fatcat:45xws5nlfnc5hgl6mgfcjnl474
LagNetViP: A Lagrangian Neural Network for Video Prediction
[article]
2020
arXiv
pre-print
We demonstrate the efficacy of this approach for video prediction on image sequences rendered in modified OpenAI gym Pendulum-v0 and Acrobot environments. ...
The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred. ...
Acknowledgements We thank Shinkyu Park, Desmond Zhong, David Isele and Patricia Posey for their insights. ...
arXiv:2010.12932v1
fatcat:yksdiikl25c25joxttgjwioloq
The Pose Knows: Video Forecasting by Generating Pose Futures
[article]
2017
arXiv
pre-print
By using the structured space of pose as an intermediate representation, we sidestep the problems that GANs have in generating video pixels directly. ...
First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space. ...
Acknowledgements: We thank the NVIDIA Corporation for the donation of GPUs for this research. In addition, this work was supported by NSF grant IIS1227495. ...
arXiv:1705.00053v1
fatcat:qao4ijregvh5xhsyvxosv4boj4
Procedure Planning in Instructional Videos
[article]
2020
arXiv
pre-print
In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as ...
plannable latent space. ...
Latent Space In this section, we discuss how to use DDN for planning in instructional videos. ...
arXiv:1907.01172v3
fatcat:5gix6len5jbh7dhn62qrwihkvm
Deep Variational Luenberger-type Observer for Stochastic Video Prediction
[article]
2020
arXiv
pre-print
In this work, we study the problem of video prediction by combining interpretability of stochastic state space models and representation learning of deep neural networks. ...
Our model builds upon an variational encoder which transforms the input video into a latent feature space and a Luenberger-type observer which captures the dynamic evolution of the latent features. ...
In parallel to the research of video prediction, stochastic prediction and filtering for time-series data, in particular the stochastic state space model, has demonstrated significant progress. ...
arXiv:2003.00835v1
fatcat:xof5zvw6mfauvpnl4xevyboahi
Insights from Generative Modeling for Neural Video Compression
[article]
2021
arXiv
pre-print
In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. ...
We present recent neural video codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured ...
Second, we improve a popular model for neural video compression, Scale-Space Flow (SSF) [4] . ...
arXiv:2107.13136v1
fatcat:mdx27avdzbabxayvhpbtbjx74a
Designing and fabricating materials from fire using sonification and deep learning
2021
iScience
, thus creating additional directions in artistic and scientific research through the creative manipulation of data with structural similarities across fields. ...
This represents the first generation of nature-inspired materials from fire and can be a platform to be used for other natural phenomena in the quest for de novo architectures, geometries, and design ideas ...
The resulting video with the predicted audio is shown in Video S1. ...
doi:10.1016/j.isci.2021.102873
fatcat:eqwvyt2lwrdsjh76ffy6wylv7y
Traversing Latent Space using Decision Ferns
[article]
2018
arXiv
pre-print
We present a novel controller module that allows for smooth traversal in the latent space and construct an end-to-end trainable framework. ...
We explore the applicability of our method for performing spatial transformations as well as kinematics for predicting future latent vectors of a video sequence. ...
Imposing Kinematics We now move towards the more complex inference task of video prediction. ...
arXiv:1812.02636v1
fatcat:bzjy6rnlvfcg3ieyb6awkk7dfu
Incorporating structural knowledge into unsupervised deep learning for two-photon imaging data
[article]
2021
bioRxiv
pre-print
Specifically, we consider variational autoencoders for models that infer a compressed representation of the data in a low-dimensional latent space, allowing for insight into what has been learned. ...
investigate how unsupervised generative deep learning can be adapted to obtain interpretable models directly at the level of the video frames. ...
an interpretable general model of neural activity. ...
doi:10.1101/2021.05.18.443587
fatcat:hptkzjceffgzxjdzfmhazmzofa
Simple Video Generation using Neural ODEs
[article]
2021
arXiv
pre-print
The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained. ...
A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature. ...
In terms of objective function used to optimise the parameters of the model, we use a combination of an L 2 reconstruction in pixel space, and an L 2 distance between the latent points predicted by the ...
arXiv:2109.03292v1
fatcat:qi7jinkfnbfgbibdiwtrkpmc4y
Towards Generalizable Deepfake Detection with Locality-aware AutoEncoder
[article]
2020
arXiv
pre-print
In the training process, we use a pixel-wise mask to regularize local interpretation of LAE to enforce the model to learn intrinsic representation from the forgery region, instead of capturing artifacts ...
We further propose an active learning framework to select the challenging candidates for labeling, which requires human masks for less than 3% of the training data, dramatically reducing the annotation ...
Augmenting Local Interpretability. The goal of local interpretation is to identify the contributions of each pixel in the input image towards a specific model prediction. ...
arXiv:1909.05999v2
fatcat:527ms3bicvgb3gxkj44q5tdvzu
DeepHeartBeat: Latent trajectory learning of cardiac cycles using cardiac ultrasounds
2020
Neural Information Processing Systems
Our model encodes high dimensional observations by a cyclic trajectory in a lower dimensional space. ...
We show that the learned parameters describing the latent trajectory are well interpretable and we demonstrate the versatility of our model by successfully applying it to various cardiologically relevant ...
Acknowledgments FL, AD and LM have been supported by PHRT -SHFN / SWISSHEART Failure Network (JMB, PI); we thank Julia Vogt for valuable discussions. ...
dblp:conf/nips/LaumerFDMB20
fatcat:ou5icd5jrvaura5ot3hicrfgi4
Cognitive-inspired Perceptual Model for Driving Automation
2020
Zenodo
This paper proposes a neural network model for visual perception in the context of autonomous driving inspired by the human cognition. ...
We believe that the theories about the human mind and its neural organization may reveal precious insights on how to design a more refined perceptual system for driving automation. ...
bias the internal representation towards the ability to predict the dynamics of objects in the scene. ...
doi:10.5281/zenodo.4781214
fatcat:xwecgkruebdvxkpwp3oestl7xu
Probabilistic Video Generation using Holistic Attribute Control
[article]
2018
arXiv
pre-print
Variational Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from the latent space and RNN as a wayto model the dynamics in the latent space. ...
As a result, given attributes and/orthe first frame, our model is able to generate diverse but highly consistent sets ofvideo sequences, accounting for the inherent uncertainty in the prediction task. ...
The motion of the resulting distribution in the latent space, accounting for the motion in the video, is modeled using an LSTM. ...
arXiv:1803.08085v1
fatcat:nzylgu2tyjgr7frm5ismzoze5i
« Previous
Showing results 1 — 15 out of 18,909 results