What You See Is What You Transform: Foveated Spatial Transformers as a bio-inspired attention mechanism [post]

Ghassan Dabane, Laurent Perrinet, Emmanuel Daucé
2021 unpublished
Convolutional Neural Networks have been considered the goto option for object recognition in computer vision for the last couple of years. However, their invariance to object's translations is still deemed as a weak point and remains limited to small translations only via their max-pooling layers. One bio-inspired approach considers the What/Where pathway separation in Mammals to overcome this limitation. This approach works as a nature-inspired attention mechanism, another classical approach
more » ... which is Spatial Transformers. These allow an adaptive endto-end learning of different classes of spatial transformations throughout training. In this work, we overview Spatial Transformers as an attention-only mechanism and compare them with the What/Where model. We show that the use of attentionrestricted or "Foveated" Spatial Transformer Networks, coupled alongside a curriculum learning training scheme and an efficient log-polar visual space entry, provides better performance when compared to the What/Where model, all this without the need for any extra supervision whatsoever.
doi:10.36227/techrxiv.16550391.v1 fatcat:unpjrtr37zbg5mnejm2zarxkwq