Filters








574 Hits in 10.7 sec

Self-Attention: A Better Building Block for Sentiment Analysis Neural Network Classifiers

Artaches Ambartsoumian, Fred Popowich
<span title="">2018</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/5w4agftuibe35hbonabjurfqf4" style="color: black;">Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</a> </i> &nbsp;
Recently, a new category of neural networks, self-attention networks (SANs), have been created which utilizes the attention mechanism as the basic building block.  ...  Sentiment Analysis has seen much progress in the past two decades. For the past few years, neural network approaches, primarily RNNs and CNNs, have been the most successful for this task.  ...  Acknowledgments We thank the anonymous reviewers for their insightful suggestions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w18-6219">doi:10.18653/v1/w18-6219</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/wassa/AmbartsoumianP18.html">dblp:conf/wassa/AmbartsoumianP18</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/u74iyv56cnftlpxeaz22bud4wa">fatcat:u74iyv56cnftlpxeaz22bud4wa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200507194158/https://www.aclweb.org/anthology/W18-6219.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/78/12/7812f610375811a3305ae44913aa184f68bfa5fb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/w18-6219"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Measuring Linguistic Diversity During COVID-19 [article]

Jonathan Dunn and Tom Coupe and Benjamin Adams
<span title="2021-04-03">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations.  ...  Here we build a baseline for temporal variation: to what degree is the data subject to unrelated fluctuations that will reduce our ability to assign a cause-and-effect relationship to linguistic diversity  ...  For example, if an application is using Twitter to track sentiment about COVID-19, that tracking is meaningless without good information about how well it represents the population.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.01290v1">arXiv:2104.01290v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qhxl6udac5g3lf7pqyrtueez5m">fatcat:qhxl6udac5g3lf7pqyrtueez5m</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210411003009/https://arxiv.org/pdf/2104.01290v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ee/cb/eecbec7872208bf5bbea819d01dfd3978d8734a7.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.01290v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Sentiment Analysis with Contextual Embeddings and Self-Attention [article]

Katarzyna Biesialska, Magdalena Biesialska, Henryk Rybinski
<span title="2020-03-12">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, we propose a simple yet effective method for sentiment analysis using contextual embeddings and a self-attention mechanism.  ...  Finally, this work is intended as a step towards introducing a universal, multilingual sentiment classifier.  ...  The self-attention block in the encoder is called multi-head self-attention.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2003.05574v1">arXiv:2003.05574v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/plnmpgkxdnehdnqebulota6iqu">fatcat:plnmpgkxdnehdnqebulota6iqu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200320184845/https://arxiv.org/pdf/2003.05574v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2003.05574v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Cascaded Semantic and Positional Self-Attention Network for Document Classification [article]

Juyong Jiang, Jie Zhang, Kai Zhang
<span title="2020-09-19">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, we propose a new architecture to aggregate the two sources of information using cascaded semantic and positional self-attention network (CSPAN) in the context of document classification.  ...  The CSPAN uses a semantic self-attention layer cascaded with Bi-LSTM to process the semantic and positional information in a sequential manner, and then adaptively combine them together through a residue  ...  below. = + Here, ∈ ℝ represents the output of first building block (Semantic self-attention), ∈ ℝ stands for the output of second building blocks (Bi-LSTM).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2009.07148v2">arXiv:2009.07148v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/d2gsdptp4raozgdyr3kw4hvgj4">fatcat:d2gsdptp4raozgdyr3kw4hvgj4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200923001813/https://arxiv.org/ftp/arxiv/papers/2009/2009.07148.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2009.07148v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling [article]

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
<span title="2018-04-03">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we propose a model, called "bi-directional block self-attention network (Bi-BloSAN)", for RNN/CNN-free sequence encoding.  ...  Recurrent neural networks (RNN), convolutional neural networks (CNN) and self-attention networks (SAN) are commonly used to produce context-aware representations.  ...  We also acknowledge the support of NVIDIA Corporation with the donation of GPU used for this research.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1804.00857v1">arXiv:1804.00857v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wd7zzlv5b5eqfm5fwl7j4r4kea">fatcat:wd7zzlv5b5eqfm5fwl7j4r4kea</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191027010327/https://arxiv.org/pdf/1804.00857v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/0e/f4/0ef460c47377c3b9482d8177cbcafad1730a91a5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1804.00857v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together [article]

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
<span title="2019-03-26">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies.  ...  feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional  ...  Government through the Australian Research Council (ARC) under grants 1) LP160100630 partnership with Australia Government Department of Health and 2) LP150100671 partnership with Australia Research Alliance for  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1805.00912v4">arXiv:1805.00912v4</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zifo5p2hbfgvzpf7enj554pqsq">fatcat:zifo5p2hbfgvzpf7enj554pqsq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200829060736/https://arxiv.org/pdf/1805.00912v4.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/66/cd/66cdaabfcd8d6707ce60281d5a36089b18cf653d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1805.00912v4" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Self-Attentive Residual Decoder for Neural Machine Translation

Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram, Andrei Popescu-Belis
<span title="">2018</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/d5ex6ucxtrfz3clshlkh3f6w2q" style="color: black;">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)</a> </i> &nbsp;
The proposed model outperforms a neural MT baseline as well as a memory and self-attention network on three language pairs.  ...  Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation.  ...  We would also like to thank James Henderson for his valuable feedback and suggestions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/n18-1124">doi:10.18653/v1/n18-1124</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/naacl/WerlenPRP18.html">dblp:conf/naacl/WerlenPRP18</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3cqj6safpzci3cqmiuia4szsjq">fatcat:3cqj6safpzci3cqmiuia4szsjq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190228201444/http://pdfs.semanticscholar.org/90f4/0d4bdfcc7c62304c043aad25695c8ae26356.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/90/f4/90f40d4bdfcc7c62304c043aad25695c8ae26356.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/n18-1124"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Character-Level Language Modeling with Deeper Self-Attention [article]

Rami Al-Rfou, Dokook Choe, Noah Constant, Mandy Guo, Llion Jones
<span title="2018-12-10">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character  ...  To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.  ...  Following Vaswani et al. (2017) , by "transformer layer" we mean a block containing a multihead self-attention sub-layer followed by a feed-forward network of two fully connected sub-layers.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1808.04444v2">arXiv:1808.04444v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pzed56corracdi6vn3ltgz6vri">fatcat:pzed56corracdi6vn3ltgz6vri</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191027143031/https://arxiv.org/pdf/1808.04444v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/95/a9/95a9ce8872576e0b7fe7ec02f178a26b5dbe6dcb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1808.04444v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer [article]

Yaru Hao, Li Dong, Furu Wei, Ke Xu
<span title="2021-02-25">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we propose a self-attention attribution method to interpret the information interactions inside Transformer. We take BERT as an example to conduct extensive studies.  ...  Firstly, we apply self-attention attribution to identify the important attention heads, while others can be pruned with marginal performance degradation.  ...  The core component of a Transformer block is multi-head self-attention.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.11207v2">arXiv:2004.11207v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uztxgswjurg7jjcpmsbabcozt4">fatcat:uztxgswjurg7jjcpmsbabcozt4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210304153746/https://arxiv.org/pdf/2004.11207v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/30/3b/303b0d5b4249badb3c125815808dd1c461bc8333.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.11207v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation [article]

Zhongfen Deng, Hao Peng, Congying Xia, Jianxin Li, Lifang He, Philip S. Yu
<span title="2020-11-02">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation, which can serve as an effective decision-making  ...  tool for the academic paper review process.  ...  We thank the reviewers for their constructive comments.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.00802v1">arXiv:2011.00802v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/57h2aujyufh6te5axorcpj3noq">fatcat:57h2aujyufh6te5axorcpj3noq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201105200746/https://arxiv.org/pdf/2011.00802v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8f/2b/8f2be495f8d4be177d3fa4f90ceac39772235862.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2011.00802v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Character-Level Language Modeling with Deeper Self-Attention

Rami Al-Rfou, Dokook Choe, Noah Constant, Mandy Guo, Llion Jones
<span title="2019-07-17">2019</span> <i title="Association for the Advancement of Artificial Intelligence (AAAI)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wtjcymhabjantmdtuptkk62mlq" style="color: black;">PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE</a> </i> &nbsp;
In this paper, we show that a deep (64-layer) transformer model (Vaswani et al. 2017) with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks  ...  To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.  ...  Specifically, we use a network of transformer self-attention layers (Vaswani et al. 2017) with causal (backward-looking) attention to process fixed-length inputs and predict upcoming characters.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1609/aaai.v33i01.33013159">doi:10.1609/aaai.v33i01.33013159</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ia4at6i6vjbvtbnkvfmqqmgnge">fatcat:ia4at6i6vjbvtbnkvfmqqmgnge</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200305153757/https://aaai.org/ojs/index.php/AAAI/article/download/4182/4060" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/75/05/750528af483311163f47dedb46b35673ddae27db.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1609/aaai.v33i01.33013159"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Multi-Scale Self-Attention for Text Classification

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang
<span title="2020-04-03">2020</span> <i title="Association for the Advancement of Artificial Intelligence (AAAI)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wtjcymhabjantmdtuptkk62mlq" style="color: black;">PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE</a> </i> &nbsp;
Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer.  ...  We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales.  ...  The building block of Multi-Scale Transformer, multiscale multi-head self-attention provides a flexible way to introduce scale bias (local or global), and it is a replacement of the multi-head self-attention  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1609/aaai.v34i05.6290">doi:10.1609/aaai.v34i05.6290</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/p3ewzpzbmzgqveg5wvdposfzai">fatcat:p3ewzpzbmzgqveg5wvdposfzai</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201104141622/https://aaai.org/ojs/index.php/AAAI/article/download/6290/6146" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/db/9a/db9a895f9d2b8cebb1794295007dd3acf653c485.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1609/aaai.v34i05.6290"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Multi-Scale Self-Attention for Text Classification [article]

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang
<span title="2019-12-02">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer.  ...  We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales.  ...  The building block of Multi-Scale Transformer, multiscale multi-head self-attention provides a flexible way to introduce scale bias (local or global), and it is a replacement of the multi-head self-attention  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.00544v1">arXiv:1912.00544v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/aeqhgx55rzccjmwoqyuhbkds3u">fatcat:aeqhgx55rzccjmwoqyuhbkds3u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906233936/https://arxiv.org/pdf/1912.00544v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/98/82/988236ed9defc9d040a5cc3844849d846c9dbd85.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.00544v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
<span title="">2019</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/d5ex6ucxtrfz3clshlkh3f6w2q" style="color: black;">Proceedings of the 2019 Conference of the North</a> </i> &nbsp;
Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both longrange and local dependencies.  ...  feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional  ...  Government through the Australian Research Council (ARC) under grants 1) LP160100630 partnership with Australia Government Department of Health and 2) LP150100671 partnership with Australia Research Alliance for  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/n19-1127">doi:10.18653/v1/n19-1127</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/naacl/ShenZL0Z19.html">dblp:conf/naacl/ShenZL0Z19</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/j6q4bnkdwva43jtfli622akxuy">fatcat:j6q4bnkdwva43jtfli622akxuy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200507185049/https://www.aclweb.org/anthology/N19-1127.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/0a/6a/0a6a6f8f1a76ce48ff52af27d4928eeed5d082a3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/n19-1127"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension [article]

Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le
<span title="2018-04-23">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions  ...  and self-attention models global interactions.  ...  We would like to thank Samy Bengio, Lei Huang, Minjoon Seo, Noam Shazeer, Ashish Vaswani, Barret Zoph and the Google Brain Team for helpful discussions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1804.09541v1">arXiv:1804.09541v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/clcqk45vm5dddo5tjjs7hkxggy">fatcat:clcqk45vm5dddo5tjjs7hkxggy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191013003455/https://arxiv.org/pdf/1804.09541v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8c/1b/8c1b00128e74f1cd92aede3959690615695d5101.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1804.09541v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 574 results