Filters








3 Hits in 2.7 sec

Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality [article]

Gustavo Aguilar, Bryan McCann, Tong Niu, Nazneen Rajani, Nitish Keskar, Thamar Solorio
2021 arXiv   pre-print
Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a drop-in replacement of the subword embedding table.  ...  To alleviate these challenges, we propose a character-based subword module (char2subword) that learns the subword embedding table in pre-trained models like BERT.  ...  Acknowledgments This work was partially funded by the National Science Foundation under grant #1910192.  ... 
arXiv:2010.12730v3 fatcat:fnaehbuerfepjphitirv5pfdcq

Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality

Gustavo Aguilar, Bryan McCann, Tong Niu, Nazneen Rajani, Nitish Shirish Keskar, Thamar Solorio
2021 Findings of the Association for Computational Linguistics: EMNLP 2021   unpublished
Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a dropin replacement of the subword embedding table.  ...  To alleviate these challenges, we propose a character-based subword module (char2subword) 1 that learns the subword embedding table in pre-trained models like BERT.  ...  Acknowledgments This work was partially funded by the National Science Foundation under grant #1910192.  ... 
doi:10.18653/v1/2021.findings-emnlp.141 fatcat:fg7rcen6mjf6zpwzvnvhx4fnze

Wine is Not v i n. – On the Compatibility of Tokenizations Across Languages [article]

Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze
2021 arXiv   pre-print
Typically, subword tokenization algorithms such as byte pair encoding and WordPiece are used.  ...  In this work, we investigate the compatibility of tokenizations for multilingual static and contextualized embedding spaces and propose a measure that reflects the compatibility of tokenizations across  ...  The second author was supported by the Bavarian research institute for digital transformation (bidt) through their fellowship program.  ... 
arXiv:2109.05772v1 fatcat:fw3xz7agm5bt7odh6zlto4seue