Vector space semantics with frequency-driven motifs

Shashank Srivastava, Eduard Hovy
2014 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
Traditional models of distributional semantics suffer from computational issues such as data sparsity for individual lexemes and complexities of modeling semantic composition when dealing with structures larger than single lexical items. In this work, we present a frequencydriven paradigm for robust distributional semantics in terms of semantically cohesive lineal constituents, or motifs. The framework subsumes issues such as differential compositional as well as noncompositional behavior of
more » ... asal consituents, and circumvents some problems of data sparsity by design. We design a segmentation model to optimally partition a sentence into lineal constituents, which can be used to define distributional contexts that are less noisy, semantically more interpretable, and linguistically disambiguated. Hellinger PCA embeddings learnt using the framework show competitive results on empirical tasks.
doi:10.3115/v1/p14-1060 dblp:conf/acl/SrivastavaH14 fatcat:dsgyalzqyvgyjhrmdzzafvdm3e