Multi-Scale Self-Attention for Text Classification

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang
<span title="2020-04-03">2020</span> <i title="Association for the Advancement of Artificial Intelligence (AAAI)"> <a target="_blank" rel="noopener" href="" style="color: black;">PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE</a> </i> &nbsp;
In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale
more &raquo; ... er outperforms the standard Transformer consistently and significantly on small and moderate size datasets.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.1609/aaai.v34i05.6290</a> <a target="_blank" rel="external noopener" href="">fatcat:p3ewzpzbmzgqveg5wvdposfzai</a> </span>
<a target="_blank" rel="noopener" href="" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href=""> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / </button> </a>