113 Hits in 5.3 sec

Group Equivariant Stand-Alone Self-Attention For Vision [article]

David W. Romero, Jean-Baptiste Cordonnier
2021 arXiv   pre-print
We provide a general self-attention formulation to impose group equivariance to arbitrary symmetry groups.  ...  Our experiments on vision benchmarks demonstrate consistent improvements of GSA-Nets over non-equivariant self-attention networks.  ...  GROUP EQUIVARIANT STAND-ALONE SELF-ATTENTION In §4.3 we concluded that unique G-equivariance is induced in self-attention by introducing positional encodings which are invariant to the action of G but  ... 
arXiv:2010.00977v2 fatcat:eb5ljnwvnbhq7ae3perf4eaxbi

Scaling Local Self-Attention for Parameter Efficient Visual Backbones [article]

Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens
2021 arXiv   pre-print
Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling  ...  We propose two extensions to self-attention that, in conjunction with a more efficient implementation of self-attention, improve the speed, memory usage, and accuracy of these models.  ...  Acknowledgements We would like to thank David Fleet for valuable discussions.  ... 
arXiv:2103.12731v3 fatcat:lmc4awfaizeg5mqb7p3b6pcf6y

LieTransformer: Equivariant self-attention for Lie Groups [article]

Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye Teh, Hyunjik Kim
2021 arXiv   pre-print
In this work, we extend the scope of the literature to self-attention, that is emerging as a prominent building block of deep learning models.  ...  Such works have mostly focused on group equivariant convolutions, building on the result that group equivariant linear maps are necessarily convolutions.  ...  Kosiorek We would also like to thank the Python community (Van Rossum & Drake Jr, 1995; Oliphant, 2007) for developing the tools that enabled this work, including Pytorch (Paszke et al., 2017) , NumPy  ... 
arXiv:2012.10885v4 fatcat:kzafci4bpfgrzodjpvobfmneee

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation [article]

Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
2020 arXiv   pre-print
In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev.  ...  Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions.  ...  for technical support.  ... 
arXiv:2003.07853v2 fatcat:5v2u47653janpexw4gherlhjea

Contextual Transformer Networks for Visual Recognition [article]

Yehao Li and Ting Yao and Yingwei Pan and Tao Mei
2021 arXiv   pre-print
in numerous computer vision tasks.  ...  Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave  ...  [40, 58] present a stand-alone design of local self-attention module, which can completely replace the spatial convolutions in ResNet architectures.  ... 
arXiv:2107.12292v1 fatcat:ws7hou7o5rhtfkc7srs5s4t2gm

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [article]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
2021 arXiv   pre-print
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.  ...  In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place.  ...  Stand-alone self-attention in vision models. In NeurIPS, 2019. Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.  ... 
arXiv:2010.11929v2 fatcat:myedumsklfcidim27uii6plwq4

Residual Pathway Priors for Soft Equivariance Constraints [article]

Marc Finzi, Gregory Benton, Andrew Gordon Wilson
2021 arXiv   pre-print
Using RPPs, we construct neural network priors with inductive biases for equivariances, but without limiting flexibility.  ...  models for model-based RL.  ...  Acknowledgements We thank Samuel Stanton for useful discussion and feedback.  ... 
arXiv:2112.01388v1 fatcat:yiyyyn6435bgdmgjen52phb3pu

X-volution: On the unification of convolution and self-attention [article]

Xuanhong Chen and Hang Wang and Bingbing Ni
2021 arXiv   pre-print
In this work, we theoretically derive a global self-attention approximation scheme, which approximates a self-attention via the convolution operation on transformed features.  ...  Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes  ...  Note that, the performance of stand-alone self-attention operator inserted into ResNet is worse than convolution, indicating that the naive introduction of self-attention operator has little effect on  ... 
arXiv:2106.02253v2 fatcat:dxl7vbutgzg2xa42e2hxipnf5a

[Re] On the Relationship between Self-Attention and Convolutional Layers

Mukund Varma, Nishant Prabhu
2021 Zenodo  
In another setting (Learned embedding w/ content), both the positional and content based attention information is used which corresponds to a full-blown stand alone self-attention model.  ...  vision tasks.  ... 
doi:10.5281/zenodo.5217601 fatcat:xqcpf32xvrflrp572fcpu23bq4

Global Self-Attention Networks for Image Recognition [article]

Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui Chen
2020 arXiv   pre-print
Recently, a series of works in computer vision have shown promising results on various image and video understanding tasks using self-attention.  ...  However, due to the quadratic computational and memory complexities of self-attention, these works either apply attention only to low-resolution feature maps in later stages of a deep network or restrict  ...  ACKNOWLEDGEMENT We sincerely thank Huiyu Wang, Yukun Zhu, and Zhichao Lu for their discussion and support.  ... 
arXiv:2010.03019v2 fatcat:k2kilixevndqdoaz5zsib2clva

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges [article]

Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković
2021 arXiv   pre-print
Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale  ...  learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for  ...  We hope that our perspective will make it easier both for newcomers and practitioners to navigate the field, and for researchers to synthesise novel architectures, as instances of our blueprint.  ... 
arXiv:2104.13478v2 fatcat:odbzfsau6bbwbhulc233cfsrom

Transformers in Vision: A Survey [article]

Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
2021 arXiv   pre-print
., self-attention, large-scale pre-training, and bidirectional encoding.  ...  Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions.  ...  We would also like to thank Mohamed Afham for his help with a figure.  ... 
arXiv:2101.01169v4 fatcat:ynsnfuuaize37jlvhsdki54cy4

Protein sequence-to-structure learning: Is this the end(-to-end revolution)? [article]

Elodie Laine, Stephan Eismann, Arne Elofsson, Sergei Grudinin
2021 arXiv   pre-print
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13.  ...  ; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; (vi) and finally truly end-to-end architectures  ...  The authors thank Kliment Olechnovič from Vilnius University for his help with illustrating Voronoi cells and proof-reading the manuscript, and Bowen Jing for his feedback on the manuscript.  ... 
arXiv:2105.07407v2 fatcat:6szubg7q2rajlj3l4vyzqri3nm

Pose is all you need: The pose only group activity recognition system (POGARS) [article]

Haritha Thilakarathne, Aiden Nibali, Zhen He, Stuart Morgan
2021 arXiv   pre-print
The proposed model uses a spatial and temporal attention mechanism to infer person-wise importance and multi-task learning for simultaneously performing group and individual action classification.  ...  In contrast to existing approaches for group activity recognition, POGARS uses 1D CNNs to learn spatiotemporal dynamics of individuals involved in a group activity and forgo learning features from pixel  ...  POGARS uses a spatial self attention mechanism for identifying the importance of each individual for the particular group activity. In addition, temporal attention is also important.  ... 
arXiv:2108.04186v1 fatcat:egelbg3uw5bfjit7vrtiindi74

AGMB-Transformer: Anatomy-Guided Multi-Branch Transformer Network for Automated Evaluation of Root Canal Therapy [article]

Yunxiang Li, Guodong Zeng, Yifan Zhang, Jun Wang, Qianni Zhang, Qun Jin, Lingling Sun, Qisi Lian, Neng Xia, Ruizi Peng, Kai Tang, Yaqi Wang (+1 others)
2021 arXiv   pre-print
Moreover, a branch fusion module and a multi-branch structure including our progressive Transformer and Group Multi-Head Self-Attention (GMHSA) are designed to focus on both global and local features for  ...  In this paper, we aim to automate this process by leveraging the advances in computer vision and artificial intelligence, to provide an objective and accurate method for root canal therapy result assessment  ...  Group Multi-Head Self-Attention is embedded in our progressive Transformer.  ... 
arXiv:2105.00381v2 fatcat:p4dzuvndn5h2fhw5lplh7yexwa
« Previous Showing results 1 — 15 out of 113 results