1 Hit in 3.2 sec

SP-ViT: Learning 2D Spatial Priors for Vision Transformers [article]

Yuxuan Zhou, Wangmeng Xiang, Chao Li, Biao Wang, Xihan Wei, Lei Zhang, Margret Keuper, Xiansheng Hua
2022 arXiv   pre-print
In this work, we present Spatial Prior-enhanced Self-Attention (SP-SA), a novel variant of vanilla Self-Attention (SA) tailored for vision transformers.  ...  Our largest model SP-ViT-L achieves a record-breaking 86.3% Top-1 accuracy with a reduction in the number of parameters by almost 50% compared to previous state-of-the-art model (150M for SP-ViT-L vs 271M  ...  enhanced by a combination of learned 2D Spatial Priors (SPs), called Spatial Prior-enhanced Self-Attention (SP-SA).  ... 
arXiv:2206.07662v1 fatcat:3zcxqnml55ajvivadvw4ktolbe