A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
unpublished
A recent variation of Transformer, Performer, scales Transformer to longer sequences with a linear attention mechanism. However, it is not compatible with relative position encoding, which has advantages over absolute position encoding. In this paper, we discuss possible ways to add relative position encoding to Performer. Based on the analysis, we propose Per-muteFormer, a Performer-based model with relative position encoding that scales linearly on long sequences. PermuteFormer applies
doi:10.18653/v1/2021.emnlp-main.828
fatcat:5x4obwt47vd7jmsfxczxwawfqy