Improved Deep Distributed Light Field Coding

M. Umair Mukati, Milan Stepanov, Giuseppe Valenzise, Soren Forchhammer, Frederic Dufaux
2021 IEEE Open Journal of Circuits and Systems  
Light fields enable increasing the degree of realism and immersion of visual experience by capturing a scene with a higher number of dimensions than conventional 2D imaging. On another side, higher dimensionality entails significant storage and transmission overhead compared to traditional video. Conventional coding schemes achieve high coding gains by employing an asymmetric codec design, where the encoder is significantly more complex than the decoder. However, in the case of light fields,
more » ... communication and processing among different cameras could be expensive, and the possibility of trading the complexity between the encoder and the decoder becomes a desirable feature. We leverage the distributed source coding paradigm to effectively reduce the encoder's complexity at the cost of increased computation at the decoder side. Specifically, we train two deep neural networks to improve the two most critical parts of a distributed source coding scheme: the prediction of side information and the estimation of the uncertainty in the prediction. Experiments show considerable BD-rate gains, above 59% over HEVC-Intra and 17.45% over our previous method DLFC-I. INDEX TERMS Deep learning, distributed source coding, light field, uncertainty estimation, view synthesis. occupying about 218 megabytes of hard disk space (i.e., 15 × 15 set of views, 10 bit, three colour channels). Conventional video coding is designed as a hybrid block-based scheme including prediction, transformation, quantization and entropy coding [1] . The inclusion of the prediction at the encoder side is the primary reason for the superior coding performance compared to transform-based coding. This framework fitted to a broadcast scenario is designed to provide efficient decoding at the cost of heavy computation at the encoder. On the contrary, there are scenarios where it is more desirable to have a power-efficient encoder and transfer most of the computation to the decoder side. These scenarios typically include low-power camera systems, for example, in wireless networks or multi-view video entertainment [2]. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 2, 2021 325
doi:10.1109/ojcas.2021.3073252 fatcat:ul7f7nvthvgubk3kz6lmweferu