A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality
[article]
2021
arXiv
pre-print
Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a drop-in replacement of the subword embedding table. ...
To alleviate these challenges, we propose a character-based subword module (char2subword) that learns the subword embedding table in pre-trained models like BERT. ...
Acknowledgments This work was partially funded by the National Science Foundation under grant #1910192. ...
arXiv:2010.12730v3
fatcat:fnaehbuerfepjphitirv5pfdcq
Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality
2021
Findings of the Association for Computational Linguistics: EMNLP 2021
unpublished
Our char2subword module builds representations from characters out of the subword vocabulary, and it can be used as a dropin replacement of the subword embedding table. ...
To alleviate these challenges, we propose a character-based subword module (char2subword) 1 that learns the subword embedding table in pre-trained models like BERT. ...
Acknowledgments This work was partially funded by the National Science Foundation under grant #1910192. ...
doi:10.18653/v1/2021.findings-emnlp.141
fatcat:fg7rcen6mjf6zpwzvnvhx4fnze
Wine is Not v i n. – On the Compatibility of Tokenizations Across Languages
[article]
2021
arXiv
pre-print
Typically, subword tokenization algorithms such as byte pair encoding and WordPiece are used. ...
In this work, we investigate the compatibility of tokenizations for multilingual static and contextualized embedding spaces and propose a measure that reflects the compatibility of tokenizations across ...
The second author was supported by the Bavarian research institute for digital transformation (bidt) through their fellowship program. ...
arXiv:2109.05772v1
fatcat:fw3xz7agm5bt7odh6zlto4seue