18,909 Hits in 6.0 sec

Towards an Interpretable Latent Space in Structured Models for Video Prediction [article]

Rushil Gupta, Vishal Sharma, Yash Jain, Yitao Liang, Guy Van den Broeck, Parag Singla
2021 arXiv   pre-print
We propose an additional decoder based loss in the pixel space, imposed in a curriculum manner, to further refine the latent space predictions.  ...  We argue that injecting explicit inductive bias in the model, in form of general physical laws, can help not only make the model more interpretable, but also improve the overall prediction of model.  ...  Acknowledgements We thank IIT Delhi HPC facility for computational resources.  ... 
arXiv:2107.07713v1 fatcat:65hyerk4ovevvgtoq7qt6pdjyu

Latent Image Animator: Learning to Animate Images via Latent Space Navigation [article]

Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva
2022 arXiv   pre-print
LIA is streamlined to animate images by linear navigation in the latent space. Specifically, motion in generated video is constructed by linear displacement of codes in the latent space.  ...  Deviating from such models, we here introduce the Latent Image Animator (LIA), a self-supervised autoencoder that evades need for structure representation.  ...  It was supported by the French Government, by the National Research Agency (ANR) under Grant ANR-18-CE92-0024, project RESPECT and through the 3IA Côte d'Azur Investments in the Future project managed  ... 
arXiv:2203.09043v1 fatcat:45xws5nlfnc5hgl6mgfcjnl474

LagNetViP: A Lagrangian Neural Network for Video Prediction [article]

Christine Allen-Blanchette, Sushant Veer, Anirudha Majumdar, Naomi Ehrich Leonard
2020 arXiv   pre-print
We demonstrate the efficacy of this approach for video prediction on image sequences rendered in modified OpenAI gym Pendulum-v0 and Acrobot environments.  ...  The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred.  ...  Acknowledgements We thank Shinkyu Park, Desmond Zhong, David Isele and Patricia Posey for their insights.  ... 
arXiv:2010.12932v1 fatcat:yksdiikl25c25joxttgjwioloq

The Pose Knows: Video Forecasting by Generating Pose Futures [article]

Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert
2017 arXiv   pre-print
By using the structured space of pose as an intermediate representation, we sidestep the problems that GANs have in generating video pixels directly.  ...  First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space.  ...  Acknowledgements: We thank the NVIDIA Corporation for the donation of GPUs for this research. In addition, this work was supported by NSF grant IIS1227495.  ... 
arXiv:1705.00053v1 fatcat:qao4ijregvh5xhsyvxosv4boj4

Procedure Planning in Instructional Videos [article]

Chien-Yi Chang, De-An Huang, Danfei Xu, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles
2020 arXiv   pre-print
In this paper, we study the problem of procedure planning in instructional videos, which can be seen as a step towards enabling autonomous agents to plan for complex tasks in everyday settings such as  ...  plannable latent space.  ...  Latent Space In this section, we discuss how to use DDN for planning in instructional videos.  ... 
arXiv:1907.01172v3 fatcat:5gix6len5jbh7dhn62qrwihkvm

Deep Variational Luenberger-type Observer for Stochastic Video Prediction [article]

Dong Wang, Feng Zhou, Zheng Yan, Guang Yao, Zongxuan Liu, Wennan Ma, Cewu Lu
2020 arXiv   pre-print
In this work, we study the problem of video prediction by combining interpretability of stochastic state space models and representation learning of deep neural networks.  ...  Our model builds upon an variational encoder which transforms the input video into a latent feature space and a Luenberger-type observer which captures the dynamic evolution of the latent features.  ...  In parallel to the research of video prediction, stochastic prediction and filtering for time-series data, in particular the stochastic state space model, has demonstrated significant progress.  ... 
arXiv:2003.00835v1 fatcat:xof5zvw6mfauvpnl4xevyboahi

Insights from Generative Modeling for Neural Video Compression [article]

Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt
2021 arXiv   pre-print
In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.  ...  We present recent neural video codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured  ...  Second, we improve a popular model for neural video compression, Scale-Space Flow (SSF) [4] .  ... 
arXiv:2107.13136v1 fatcat:mdx27avdzbabxayvhpbtbjx74a

Designing and fabricating materials from fire using sonification and deep learning

Mario Milazzo, Markus J. Buehler
2021 iScience  
, thus creating additional directions in artistic and scientific research through the creative manipulation of data with structural similarities across fields.  ...  This represents the first generation of nature-inspired materials from fire and can be a platform to be used for other natural phenomena in the quest for de novo architectures, geometries, and design ideas  ...  The resulting video with the predicted audio is shown in Video S1.  ... 
doi:10.1016/j.isci.2021.102873 fatcat:eqwvyt2lwrdsjh76ffy6wylv7y

Traversing Latent Space using Decision Ferns [article]

Yan Zuo, Gil Avraham, Tom Drummond
2018 arXiv   pre-print
We present a novel controller module that allows for smooth traversal in the latent space and construct an end-to-end trainable framework.  ...  We explore the applicability of our method for performing spatial transformations as well as kinematics for predicting future latent vectors of a video sequence.  ...  Imposing Kinematics We now move towards the more complex inference task of video prediction.  ... 
arXiv:1812.02636v1 fatcat:bzjy6rnlvfcg3ieyb6awkk7dfu

Incorporating structural knowledge into unsupervised deep learning for two-photon imaging data [article]

Florian Eichin, Maren Hackenberg, Caroline Broichhagen, Antje Kilias, Jan Schmoranzer, Marlene Bartos, Harald Binder
2021 bioRxiv   pre-print
Specifically, we consider variational autoencoders for models that infer a compressed representation of the data in a low-dimensional latent space, allowing for insight into what has been learned.  ...  investigate how unsupervised generative deep learning can be adapted to obtain interpretable models directly at the level of the video frames.  ...  an interpretable general model of neural activity.  ... 
doi:10.1101/2021.05.18.443587 fatcat:hptkzjceffgzxjdzfmhazmzofa

Simple Video Generation using Neural ODEs [article]

David Kanaa and Vikram Voleti and Samira Ebrahimi Kahou and Christopher Pal
2021 arXiv   pre-print
The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained.  ...  A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature.  ...  In terms of objective function used to optimise the parameters of the model, we use a combination of an L 2 reconstruction in pixel space, and an L 2 distance between the latent points predicted by the  ... 
arXiv:2109.03292v1 fatcat:qi7jinkfnbfgbibdiwtrkpmc4y

Towards Generalizable Deepfake Detection with Locality-aware AutoEncoder [article]

Mengnan Du, Shiva Pentyala, Yuening Li, Xia Hu
2020 arXiv   pre-print
In the training process, we use a pixel-wise mask to regularize local interpretation of LAE to enforce the model to learn intrinsic representation from the forgery region, instead of capturing artifacts  ...  We further propose an active learning framework to select the challenging candidates for labeling, which requires human masks for less than 3% of the training data, dramatically reducing the annotation  ...  Augmenting Local Interpretability. The goal of local interpretation is to identify the contributions of each pixel in the input image towards a specific model prediction.  ... 
arXiv:1909.05999v2 fatcat:527ms3bicvgb3gxkj44q5tdvzu

DeepHeartBeat: Latent trajectory learning of cardiac cycles using cardiac ultrasounds

Fabian Laumer, Gabriel Fringeli, Alina Dubatovka, Laura Manduchi, Joachim M. Buhmann
2020 Neural Information Processing Systems  
Our model encodes high dimensional observations by a cyclic trajectory in a lower dimensional space.  ...  We show that the learned parameters describing the latent trajectory are well interpretable and we demonstrate the versatility of our model by successfully applying it to various cardiologically relevant  ...  Acknowledgments FL, AD and LM have been supported by PHRT -SHFN / SWISSHEART Failure Network (JMB, PI); we thank Julia Vogt for valuable discussions.  ... 
dblp:conf/nips/LaumerFDMB20 fatcat:ou5icd5jrvaura5ot3hicrfgi4

Cognitive-inspired Perceptual Model for Driving Automation

Alice Plebe, Gastone Pietro Rosati Papini, Mauro Da Lio
2020 Zenodo  
This paper proposes a neural network model for visual perception in the context of autonomous driving inspired by the human cognition.  ...  We believe that the theories about the human mind and its neural organization may reveal precious insights on how to design a more refined perceptual system for driving automation.  ...  bias the internal representation towards the ability to predict the dynamics of objects in the scene.  ... 
doi:10.5281/zenodo.4781214 fatcat:xwecgkruebdvxkpwp3oestl7xu

Probabilistic Video Generation using Holistic Attribute Control [article]

Jiawei He, Andreas Lehrmann, Joseph Marino, Greg Mori, Leonid Sigal
2018 arXiv   pre-print
Variational Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from the latent space and RNN as a wayto model the dynamics in the latent space.  ...  As a result, given attributes and/orthe first frame, our model is able to generate diverse but highly consistent sets ofvideo sequences, accounting for the inherent uncertainty in the prediction task.  ...  The motion of the resulting distribution in the latent space, accounting for the motion in the video, is modeled using an LSTM.  ... 
arXiv:1803.08085v1 fatcat:nzylgu2tyjgr7frm5ismzoze5i
« Previous Showing results 1 — 15 out of 18,909 results