Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

Thomas Verelst, Tinne Tuytelaars
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Modern convolutional neural networks apply the same operations on every pixel in an image. However, not all image regions are equally important. To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image. We introduce a residual block where a small gating branch learns which spatial positions should be evaluated. These discrete gating decisions are trained end-to-end using the Gumbel-Softmax trick, in combination with a sparsity criterion.
more » ... Our experiments on CIFAR, ImageNet, Food-101 and MPII show that our method has better focus on the region of interest and better accuracy than existing methods, at a lower computational complexity. Moreover, we provide an efficient CUDA implementation of our dynamic convolutions using a gather-scatter approach, achieving a significant improvement in inference speed on MobileNetV2 and ShuffleNetV2. On human pose estimation, a task that is inherently spatially sparse, the processing speed is increased by 60% with no loss in accuracy.
doi:10.1109/cvpr42600.2020.00239 dblp:conf/cvpr/VerelstT20 fatcat:uesc4pdzj5hrtdtbr5zd6ypubm