Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images

Wuyang Chen, Ziyu Jiang, Zhangyang Wang, Kexin Cui, Xiaoning Qian
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Mobile memory capacity (a) Best performance achievable Mobile memory capacity (b) Performance trained on global image Mobile memory capacity (c) Performance trained on local patches Figure 1: Inference memory and mean intersection over union (mIoU) accuracy on the DeepGlobe dataset [1]. (a): Comparison of best achievable mIoU v.s. memory for different segmentation methods. (b): mIoU/memory with different global image sizes (downsampling rate shown in scale annotations). (c): mIoU/memory with
more » ... ferent local patch sizes (normalized patch size shown in scale annotations). GLNet (red dots) integrates both global and local information in a compact way, contributing to a well-balanced trade-off between accuracy and memory usage. See Section 4 for experiment details. Methods studied: ICNet [2], DeepLabv3+ [3], FPN [4], FCN-8s [5], UNet [6], PSPNet [7], SegNet [8], and the proposed GLNet. Abstract Segmentation of ultra-high resolution images is increasingly demanded, yet poses significant challenges for algorithm efficiency, in particular considering the (GPU) memory limits. Current approaches either downsample an ultrahigh resolution image or crop it into small patches for separate processing. In either way, the loss of local fine details or global contextual information results in limited segmentation accuracy. We propose collaborative Global-Local Networks (GLNet) to effectively preserve both global and local information in a highly memory-efficient manner. GLNet is composed of a global branch and a local branch, taking the downsampled entire image and its cropped local patches as respective inputs. For segmentation, GLNet deeply fuses feature maps from two branches, capturing both the high-resolution fine structures from zoomed-in local patches and the contextual dependency from the downsampled input. To further resolve the potential class imbalance problem between background and foreground regions, we present a coarse-to-fine variant of GLNet, also being * The first two authors contributed equally. memory-efficient. Extensive experiments and analyses have been performed on three real-world ultra-high aerial and medical image datasets (resolution up to 30 million pixels). With only one single 1080Ti GPU and less than 2GB memory used, our GLNet yields high-quality segmentation results and achieves much more competitive accuracymemory usage trade-offs compared to state-of-the-arts.
doi:10.1109/cvpr.2019.00913 dblp:conf/cvpr/ChenJWCQ19 fatcat:alcpzi2ye5agtcat3kiyw5ug4q