4 Hits in 1.2 sec

SMYRF: Efficient Attention using Asymmetric Clustering [article]

Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis
2020 arXiv   pre-print
Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters.  ...  Notably, SMYRF-BERT outperforms (slightly) BERT on GLUE, while using 50% less memory. We also show that SMYRF can be used interchangeably with dense attention before and after training.  ...  Acknowledgements We would like to wholeheartedly thank the TensorFlow Research Cloud (TFRC) program that gave us access to v3-8 Cloud TPUs and GCP credits that we used to run our Computer Vision experiments  ... 
arXiv:2010.05315v1 fatcat:bpnfx7ii3jearelzr6h75bp77i

Visual Attention Methods in Deep Learning: An In-Depth Survey [article]

Mohammed Hassanin, Saeed Anwar, Ibrahim Radwan, Fahad S Khan, Ajmal Mian
2022 arXiv   pre-print
However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models.  ...  Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive.  ...  We thank Professor Mubarak Shah for his useful comments that significantly improved the presentation of the survey.  ... 
arXiv:2204.07756v2 fatcat:rpmffb2xsrferdd3ijxxpmjati

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [article]

Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu
2021 arXiv   pre-print
Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform  ...  Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.  ...  To show the effectiveness of our proposed NPRF-Transformer with RPE, we choose several competitive baselines in literature, including : Linformer [48] , Nyströmformer [49] , SMYRF [7] , and Fast Clustered  ... 
arXiv:2106.12566v2 fatcat:hz6wtrcmwvdbfmzmlu2d5noyo4

Solving Inverse Problems with NerfGANs [article]

Giannis Daras, Wen-Sheng Chu, Abhishek Kumar, Dmitry Lagun, Alexandros G. Dimakis
2021 arXiv   pre-print
We introduce a novel framework for solving inverse problems using NeRF-style generative models.  ...  Smyrf: Efficient attention using asymmetric [20] Abhishek Kumar and Ehsan Amid.  ...  Constrained instance and clustering. arXiv preprint arXiv:2010.05315, 2020. 1 class reweighting for robust learning under label noise, 2021.  ... 
arXiv:2112.09061v1 fatcat:ltoz42xkvzcvzexlvqtmvsyray