Filters








3,620 Hits in 8.4 sec

End-to-end Neural Video Coding Using a Compound Spatiotemporal Representation [article]

Haojie Liu, Ming Lu, Zhiqi Chen, Xun Cao, Zhan Ma, Yao Wang
2021 arXiv   pre-print
Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy.  ...  Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames.  ...  Note that even if we use a small N during the training, the resulting models for inter coding and residual coding can be used for all P-frames within a group of pictures with more than N frames.  ... 
arXiv:2108.04103v1 fatcat:u43qnoz5pfgmvo3lwsmvr4kk4m

Neural Video Coding using Multiscale Motion Compensation and Spatiotemporal Context Model [article]

Haojie Liu, Ming Lu, Zhan Ma, Fan Wang, Zhihuang Xie, Xun Cao, Yao Wang
2020 arXiv   pre-print
in intra-frame pixels, inter-frame motions and inter-frame compensation residuals, respectively.  ...  global and local information, and 4) we introduce multi-module optimization and a multi-frame training strategy to minimize the temporal error propagation among P-frames.  ...  Additionally, joint spatiotemporal and hyper priors are aggregated for efficient and adaptive context modeling of latent features to improve entropy coding efficiency for the motion field.  ... 
arXiv:2007.04574v1 fatcat:gc5lnxixnvemdildw246wzaduq

H4D: Human 4D Modeling by Learning Neural Compositional Representation [article]

Boyan Jiang, Yinda Zhang, Xingkui Wei, Xiangyang Xue, Yanwei Fu
2022 arXiv   pre-print
Particularly, our representation, named H4D, represents a dynamic 3D human over a temporal span with the SMPL parameters of shape and initial pose, and latent codes encoding motion and auxiliary information  ...  A simple yet effective linear motion model is proposed to provide a rough and regularized motion estimation, followed by per-frame compensation for pose and geometry details with the residual encoded in  ...  The corresponding authors are Xiangyang Xue, and Yanwei Fu.  ... 
arXiv:2203.01247v2 fatcat:rd62hdvrg5ccxdcc4s7bqpjkwa

3D Human Motion Estimation via Motion Compression and Refinement [article]

Zhengyi Luo, S. Alireza Golestaneh, Kris M. Kitani
2020 arXiv   pre-print
and a residual representation learned through motion refinement.  ...  We develop a technique for generating smooth and accurate 3D human pose and motion estimates from RGB video sequences.  ...  Acknowledgements: This project was sponsored in part by IARPA (D17PC00340), and JST AIP Acceleration Research Grant (JPMJCR20U1).  ... 
arXiv:2008.03789v2 fatcat:y3rx7wgwbfeo3ijddukftza6ri

Neural Inter-Frame Compression for Video Coding

Abdelaziz Djelouah, Joaquim Campos, Simone Schaub-Meyer, Christopher Schroers
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
The key insight is that we can increase both decoding efficiency and reconstruction quality by encoding the required information into a latent representation that directly decodes into motion and blending  ...  In order to account for remaining prediction errors, residual information between the original image and the interpolated frame is needed.  ...  For entropy coding, we used the probability model proposed by Minnen et al. [17] to model image latents pŷ, latent residual values pr and motion information pq.  ... 
doi:10.1109/iccv.2019.00652 dblp:conf/iccv/DjelouahCSS19 fatcat:36gzffkhozcajadlx3spjg4x7q

Generalized Difference Coder: A Novel Conditional Autoencoder Structure for Video Compression [article]

Fabian Brand, Jürgen Seiler, André Kaup
2021 arXiv   pre-print
In this paper, we provide a solid foundation based on information theory and Shannon entropy to show the potentials but also the limits of conditional coding.  ...  The concept was established in traditional hybrid coding and successfully transferred to learning-based video compression.  ...  The resulting latent representation is coded and transmitted over the channel.  ... 
arXiv:2112.08011v1 fatcat:dxmjplkcinhhfl6ik5z6ukakt4

VCT: A Video Compression Transformer [article]

Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson
2022 arXiv   pre-print
Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distribution of future representations given the past.  ...  Our approach is easy to implement, and we release code to facilitate future research.  ...  Acknowledgements We thank Basil Mustafa, Ashok Popat, Huiwen Chang, Phil Chou, Johannes Ballé, and Nick Johnston for the insightful discussions and feedback.  ... 
arXiv:2206.07307v1 fatcat:kfe5jcq5xfebrilessppqq7l7i

Jointly Trained Image and Video Generation using Residual Vectors

Yatin Dandi, Aniket Das, Soumye Singhal, Vinay P. Namboodiri, Piyush Rai
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames.  ...  The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video.  ...  Acknowledgements: PR acknowledges support from Visvesvaraya Young Faculty Fellowship and Research-I Foundation.  ... 
doi:10.1109/wacv45572.2020.9093308 dblp:conf/wacv/DandiDSNR20 fatcat:okk6ocpy6ndylk2vgrzgzuuctq

Jointly Trained Image and Video Generation using Residual Vectors [article]

Yatin Dandi, Aniket Das, Soumye Singhal, Vinay P. Namboodiri, Piyush Rai
2019 arXiv   pre-print
The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames.  ...  The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video.  ...  For RJVAE, the latent code z (t) = µ + δ (t) input to G I is of dimension 64.  ... 
arXiv:1912.07991v1 fatcat:z5qu4raa45fljpqx7tejqq3udu

Neural Video Compression using Spatio-Temporal Priors [article]

Haojie Liu, Tong Chen, Ming Lu, Qiu Shen, Zhan Ma
2019 arXiv   pre-print
motion and residuals.  ...  Spatial priors are generated using downscaled low-resolution features, while temporal priors (from previous reference frames and residuals) are captured using a convolutional neural network based long-short  ...  Fig. 1(b) ); 2) temporal priors for frame reconstruction (cf. Fig. 1 (c)); and 3) joint spatiotemporal priors for temporal predictive residual encoding (cf.  ... 
arXiv:1902.07383v2 fatcat:brynmcohtzdtdo3nyymhsshubi

Deep Predictive Video Compression using Mode-Selective Uni-and Bi-directional Predictions based on Multi-frame Hypothesis

Woonsung Park, Munchurl Kim
2020 IEEE Access  
Our DeepPVCnet jointly compresses motion information and residual data that are generated from the multi-scale structure via the feature transformation layers.  ...  Also, we propose a temporal-context-adaptive entropy model that utilizes the temporal context information of the reference frames for the current frame coding.  ...  By doing so, the multi-scale joint information of motions and residuals can be effectively fused for better compression.  ... 
doi:10.1109/access.2020.3046040 fatcat:bvfzpl26arfr5ffhh26qlbc5hm

Learned Video Compression with Feature-level Residuals

Runsen Feng, Yaojun Wu, Zongyu Guo, Zhizheng Zhang, Zhibo Chen
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
In this paper, we present an end-to-end video compression network for P-frame challenge on CLIC.  ...  First, we notice that pixel space residuals is sensitive to the prediction errors of optical flow based motion compensation.  ...  Acknowledgment This work was supported in part by NSFC under Grant U1908209, 61632001 and the National Key Research and Development Program of China 2018AAA0101400.  ... 
doi:10.1109/cvprw50498.2020.00068 dblp:conf/cvpr/FengWGZ020 fatcat:wag5jersijdgfhqeha5h3hlsny

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders [article]

Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao
2021 arXiv   pre-print
However, splitting the latent code into two parts poses training difficulties for the VAE model.  ...  The shared code mainly models the strong correlation between audio and motion (such as the synchronized audio and motion beats), while the motion-specific code captures diverse motion information independent  ...  Different from [38, 43] , our method disentangle the motion representation into the audio-motion shared information and motion-specific information to model the one-to-many mapping between audio and motion  ... 
arXiv:2108.06720v1 fatcat:cjognac7gbdavkfut2qu2kichu

Learned Video Compression via Joint Spatial-Temporal Correlation Exploration [article]

Haojie Liu, Han shen, Lichao Huang, Ming Lu, Tong Chen, Zhan Ma
2019 arXiv   pre-print
Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency. Efficient temporal information representation plays a key role in video coding.  ...  We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors  ...  We proposed an end-to-end video compression framework using joint spatial-temporal priors to generate compact latent feature representations for intra texture, inter motion and sparse inter residual signals  ... 
arXiv:1912.06348v1 fatcat:h6chbcl52nbwtbpx6hrrzj7fme

Learned Video Compression via Joint Spatial-Temporal Correlation Exploration

Haojie Liu, Han Shen, Lichao Huang, Ming Lu, Tong Chen, Zhan Ma
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency. Efficient temporal information representation plays a key role in video coding.  ...  We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors  ...  We proposed an end-to-end video compression framework using joint spatial-temporal priors to generate compact latent feature representations for intra texture, inter motion and sparse inter residual signals  ... 
doi:10.1609/aaai.v34i07.6825 fatcat:naduixdarnfy3ebtcw55ht2h5e
« Previous Showing results 1 — 15 out of 3,620 results