Modeling Semantic Compositionality of Croatian Multiword Expressions

Jan Šnajder, Petra Almi´c
2015 Informatica   unpublished
A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate models based on Latent Semantic Analysis and the recently proposed neural network-based Skip-gram model,
experiment with different composition functions. We show that the compositionality scores predicted by the Skip-gram additive models correlate well with human judgments (ρ=0.50). When framed as a classification task, the model achieves an accuracy of 0.64. Povzetek: Razvita je metoda za dekompozicijo hrvaškega jezika.