Beyond Context: A New Perspective for Word Embeddings

Yichu Zhou, Vivek Srikumar
<span title="">2019</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="" style="color: black;">Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*</a> </i> &nbsp;
Most word embeddings today are trained by optimizing a language modeling goal of scoring words in their context, modeled as a multiclass classification problem. Despite the successes of this assumption, it is incomplete: in addition to its context, orthographical or morphological aspects of words can offer clues about their meaning. In this paper, we define a new modeling framework for training word embeddings that captures this intuition. Our framework is based on the well-studied problem of
more &raquo; ... lti-label classification and, consequently, exposes several design choices for featurizing words and contexts, loss functions for training and score normalization. Indeed, standard models such as CBOW and FAST-TEXT are specific choices along each of these axes. We show via experiments that by combining feature engineering with embedding learning, our method can outperform CBOW using only 10% of the training data in both the standard word embedding evaluations and also text classification experiments.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.18653/v1/s19-1003</a> <a target="_blank" rel="external noopener" href="">dblp:conf/starsem/ZhouS19</a> <a target="_blank" rel="external noopener" href="">fatcat:udz2aedmdnd2nnjh5rtgatrfr4</a> </span>
