1 Hit in 2.8 sec

Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention [article]

Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo
2021 arXiv   pre-print
Meanwhile, it can capture richer contextual information under the similar computation complexity with previous local self-attention mechanisms.  ...  To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency.  ...  MSG-Transformer: Exchanging Local Spatial Infor- arXiv:2105.03889. mation by Manipulating Messenger Tokens. arXiv preprint Radosavovic, I.; Kosaraju, R.  ... 
arXiv:2112.14000v1 fatcat:mhn3mkrdwner7eswgpklwvip6u