Filters








102 Hits in 3.4 sec

Towards Accurate Generative Models of Video: A New Metric & Challenges [article]

Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, Sylvain Gelly
2019 arXiv   pre-print
To this extent we propose Fréchet Video Distance (FVD), a new metric for generative models of video, and StarCraft 2 Videos (SCV), a benchmark of game play from custom starcraft 2 scenarios that challenge  ...  We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.  ...  Conclusion We introduced the Fréchet Video Distance (FVD), a new evaluation metric for generative models of video, and an important step towards better evaluation of models for video generation.  ... 
arXiv:1812.01717v2 fatcat:aab3klrxwvantmc5ayoccbvaxa

Markov Decision Process for Video Generation [article]

Vladyslav Yushchenko, Nikita Araslanov, Stefan Roth
2019 arXiv   pre-print
To address this, we reformulate the problem of video generation as a Markov Decision Process (MDP).  ...  We identify two pathological cases of temporal inconsistencies in video generation: video freezing and video looping.  ...  The authors thank Sergey Tulyakov and Masaki Saito for helpful clarifications.  ... 
arXiv:1909.12400v1 fatcat:iqf5jzevj5bbjoulgx2oxsgxry

Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study [article]

Manuel Serra Nunes, Atabak Dehban, Plinio Moreno, José Santos-Victor
2019 arXiv   pre-print
In this paper, we are proposing a new metric to compare different video prediction models based on this argument.  ...  However, a comprehensive method for determining the fitness of different video prediction models at guiding the selection of actions is yet to be developed.  ...  Acknowledgements This work is partially supported by the Portuguese Foundation for Science and Technology (FCT) project [UID/EEA/50009/2019].  ... 
arXiv:1910.02564v1 fatcat:vd2djvrgffbxxl6twijvwby2mm

DVC-P: Deep Video Compression with Perceptual Optimizations [article]

Saiping Zhang, Marta Mrak, Luis Herranz, Marc Górriz, Shuai Wan, Fuzheng Yang
2021 arXiv   pre-print
Experimental results demonstrate that, compared with the baseline DVC, our proposed method can generate videos with higher perceptual quality achieving 12.27% reduction in a perceptual BD-rate equivalent  ...  Specifically, a discriminator network and a mixed loss are employed to help our network trade off among distortion, perception and rate.  ...  The GOP size is 10, and the first 100 frames are tested for each sequence. 1) Perceptual Video Quality Metric: We test perceptual quality of decoded videos by FVD.  ... 
arXiv:2109.10849v2 fatcat:yybw353jm5cxhet3kgjczuxddy

Latent Video Transformer [article]

Ruslan Rakhimov, Denis Volkhonskiy, Alexey Artemov, Denis Zorin, Evgeny Burnaev
2020 arXiv   pre-print
The video generation task can be formulated as a prediction of future video frames given some past frames. Recent generative models for videos face the problem of high computational requirements.  ...  Some models require up to 512 Tensor Processing Units for parallel training. In this work, we address this problem via modeling the dynamics in a latent space.  ...  They also acknowledge Vage Egiazarian for thoughtful discussions of the model and the experiments.  ... 
arXiv:2006.10704v1 fatcat:lzq7cokewzdrzcqemo7hmv3za4

Scaling Autoregressive Video Models [article]

Dirk Weissenborn, Oscar Täckström, Jakob Uszkoreit
2020 arXiv   pre-print
In contrast, we show that conceptually simple autoregressive video generation models based on a three-dimensional self-attention mechanism achieve competitive results across multiple metrics on popular  ...  Due to the statistical complexity of video, the high degree of inherent stochasticity, and the sheer amount of data, generating natural video remains a challenging task.  ...  We would also like to thank Chelsea Finn and Tom Kwiatkowski for thoughtful comments on an earlier draft.  ... 
arXiv:1906.02634v3 fatcat:eu3jxenc5jg3dfvfxin3hdpnm4

Transframer: Arbitrary Frame Prediction with Generative Models [article]

Charlie Nash, João Carreira, Jacob Walker, Iain Barr, Andrew Jaegle, Mateusz Malinowski, Peter Battaglia
2022 arXiv   pre-print
We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction.  ...  Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single  ...  .: Towards accurate generative models of video: A new metric & challenges. ICLR Workshops (2019) 10 54.  ... 
arXiv:2203.09494v3 fatcat:4hfy5x53vbdv3bknwxr5kt67va

Novel View Video Prediction Using a Dual Representation [article]

Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah
2021 arXiv   pre-print
Moreover, our method relies only onRGB frames to learn a dual representation which is used to generate the video from a novel viewpoint.  ...  We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.  ...  [20] Thomas Unterthiner, Sjoerd van Steenkiste, Karol Ku- rach, Raphael Marinier, Marcin Michalski, and Syl- vain Gelly, "Towards accurate generative models of video: A new metric & challenges," arXiv  ... 
arXiv:2106.03956v1 fatcat:lbxbfbjynja3hnznwwc66z5zmq

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction [article]

Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee, Seunghoon Hong
2021 arXiv   pre-print
(i.e., thousands frames), setting a new standard of video prediction with orders of magnitude longer prediction time than existing approaches.  ...  We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon  ...  For the video-level evaluation, we adopt FVD that measures a Fréchet distance between the ground-truth videos and the generated ones in a video representation space.  ... 
arXiv:2104.06697v1 fatcat:thkaq2a53fhyzedof52lzw7xtm

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 [article]

Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny
2022 arXiv   pre-print
We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate.  ...  This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024^2 videos for the first time.  ...  Frechet Video Distance (FVD) [68] serves as the main metric for video synthesis, but there is no complete official implementation for it (see §4 and Appx C).  ... 
arXiv:2112.14683v3 fatcat:qnsoi4xsgbglfgmumpioveuvjq

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN [article]

Gereon Fox and Ayush Tewari and Mohamed Elgharib and Christian Theobalt
2021 arXiv   pre-print
After training, our model can not only generate new portrait videos for the training subject, but also for any random subject which can be embedded in the StyleGAN space.  ...  We present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating videos.  ...  We also thank Pramod Rao for his invaluable support in conducting the experiments for our evaluation section. This work was supported by the ERC Consolidator Grant 4DReply (770784).  ... 
arXiv:2107.07224v2 fatcat:2lbxp7tenjbc7febnov6d57ql4

S-Flow GAN [article]

Yakov Miron, Yona Coscas
2019 arXiv   pre-print
Our work offers a new method for domain translation from semantic label maps and Computer Graphic (CG) simulation edge map images to photo-realistic images.  ...  We train a Generative Adversarial Network (GAN) in a conditional way to generate a photo-realistic version of a given CG scene.  ...  FVD is a metric for video generation models evaluation and it uses a modified version of FID. we calculated the FVD score for our generated video (Ours-vid) w.r.t. the Oracle (real video) and did the same  ... 
arXiv:1905.08474v2 fatcat:3thpv32pefgh3fwa6k2ddzii5u

Stochastic Image-to-Video Synthesis using cINNs [article]

Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer
2021 arXiv   pre-print
In contrast to common stochastic image-to-video synthesis, such a model does not merely generate arbitrary videos progressing the initial image.  ...  for controlled video synthesis.  ...  Dynamic Texture FVD (DTFVD) In Sec. 4.3 of our main paper, we introduced a dedicated FVD metric for the domain of dynamics textures, the Dynamic Texture Fréchet Video Distance (DTFVD).  ... 
arXiv:2105.04551v2 fatcat:sye4z4og6vfghardh3edxsc7pi

Playable Video Generation [article]

Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci
2021 arXiv   pre-print
We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos.  ...  In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game.  ...  From this observation, we propose a new task, Playable Video Generation (PVG) illustrated in Fig 1a .  ... 
arXiv:2101.12195v1 fatcat:rl2xllly2zb4ddhrbumt5elgrm

CCVS: Context-aware Controllable Video Synthesis [article]

Guillaume Le Moing and Jean Ponce and Cordelia Schmid
2021 arXiv   pre-print
This presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions  ...  by affording simple mechanisms for handling multimodal ancillary information for controlling the synthesis process (eg, a few sample frames, an audio track, a trajectory in image space) and taking into  ...  We thank the reviewers for useful comments.  ... 
arXiv:2107.08037v2 fatcat:xmpykxzkz5cxrppejzlc5u6i6i
« Previous Showing results 1 — 15 out of 102 results