Local Relation Networks for Image Recognition [article]

Han Hu and Zheng Zhang and Zhenda Xie and Stephen Lin
2019 arXiv   pre-print
The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this
more » ... tional approach, it can composite visual elements into higher-level entities in a more efficient manner that benefits semantic inference. A network built with local relation layers, called the Local Relation Network (LR-Net), is found to provide greater modeling capacity than its counterpart built with regular convolution on large-scale recognition tasks such as ImageNet classification.
arXiv:1904.11491v1 fatcat:r4iu5cespnbx3debffoqf6kxee