Unsupervised Compositionality Prediction of Nominal Compounds
Nominal compounds such as red wine and nut case display a continuum of compositionality, with varying contributions from the components of the compound to its semantics. This article proposes a framework for compound compositionality prediction using distributional semantic models, evaluating to what extent they capture idiomaticity compared to human judgments. For evaluation, we introduce datasets containing human judgments in three languages: English, French and Portuguese. The results
... d reveal a high agreement between the models and human predictions, suggesting that they are able to incorporate information about idiomaticity. We also present an in-depth evaluation of various factors that can affect prediction, such as model and corpus parameters and compositionality operations. General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best results. Computational Linguistics Volume xx, Number xx 2 This article significantly extends and updates previous publications: 1. We consolidate the description of the datasets introduced in Ramisch et al. (2016) and Ramisch, Cordeiro, and Villavicencio (2016) by adding details about data collection, filtering and results of a thorough analysis studying the correlation between compositionality and related variables. 2. We extend the compositionality prediction framework described in Cordeiro, Ramisch, and Villavicencio (2016) by adding and evaluating new composition functions and DSMs. 3. We extend the evaluation reported in Cordeiro et al. (2016) not only by adding Portuguese, but also by evaluating additional parameters: corpus size, composition functions, and new DSMs.