Variable-Rate Deep Image Compression With Vision Transformers

Binglin Li, Jie Liang, Jingning Han
2022 IEEE Access  
Recently, vision transformers have been applied in many computer vision problems due to its long-range learning ability. However, it has not been throughly explored in image compression. We propose a patch-based learned image compression network by incorporating vision transformers. The input image is divided into patches before feeding to the encoder and the patches are reconstructed from the decoder to form a complete image. Different kinds of transformer blocks (TransBlocks) are applied to
more » ... et the various requirements in the subnetworks. We also propose a transformer-based context model (TransContext) to facilitate the coding based on previously decoded symbols. Since the computational complexity of the attention mechanism in transformers is a quadratic function of the sequence length, we partition the feature tensor into different segments and conduct the transformer in each segment to save computational cost. To alleviate the compression artifacts, we use overlapping patches and apply an existing deblocking network to further remove the artifacts. At last, the residual coding scheme is adopted to get the compression performance for variable bit rates. We show that our patch-based learned image compression with transformers obtain 0.75dB improvement in PSNR at 0.15bpp than the prior variable-rate compression work on the Kodak dataset. When using the residual coding strategy, our framework keeps good performance in PSNR and is comparable to BPG420. For MS-SSIM, we get higher results than BPG444 across a range of bit rates (0.021 at 0.21bpp) and other variable-rate learned image compression models at low bit rates. Learned image compression, transformer, variable-rate. INDEX TERMS
doi:10.1109/access.2022.3173256 fatcat:ctzts747mrh4flxkzrbjnkpomq