A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Parsimonious Morpheme Segmentation with an Application to Enriching Word Embeddings
[article]
2019
arXiv
pre-print
Traditionally, many text-mining tasks treat individual word-tokens as the finest meaningful semantic granularity. However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures. Word-level analysis cannot leverage the semantic information present in such subword structures. With regard to word embedding techniques, this leads to not only poor embeddings for infrequent words in long-tailed text corpora but also weak capabilities
arXiv:1908.07832v2
fatcat:ebil4lzppbbh3dn5e4kr6tlyyq