Computational approaches for German particle verbs: compositionality, sense discrimination and non-literal language [article]

Maximilian Köper, Universität Stuttgart, Universität Stuttgart
2018
Anfangen (to start) is a German particle verb. Consisting of two parts, a base verb ("fangen") and particle ("an"), with potentially many or no intervening words in a sentence, particle verbs are highly frequent constructions with special properties. It has been shown that this type of verb represents a serious problem for language technology, due to particle verbs' ambiguity, ability to occur separate and seemingly unpredictable behaviour in terms of meaning. This dissertation addresses the
more » ... ning of German particle verbs via large-scale computational approaches. The three central parts of the thesis are concerned with computational models for the following components: i) compositionality, ii) senses and iii) non-literal language. In the first part of this thesis, we shed light on the phenomena by providing information on the properties of particle verbs, as well as the related and prior literature. In addition, we present the first corpus-driven statistical analysis. We use two different approaches for addressing the modelling of compositionality. For both approaches, we rely on large amounts of textual data with an algebraic model for representation to approximate meaning. We put forward the existing methodology and show that the prediction of compositionality can be improved by considering visual information. We model the particle verb senses based only on huge amounts of texts, without access to other resources. Furthermore, we compare and introduce the methods to find and represent different verb senses. Our findings indicate the usefulness of such sense-specific models. We successfully present the first model for detecting the non-literal language of particle verbs in a running text. Our approach reaches high performance by combining the established techniques from metaphor detection with particle verb-specific information. In the last part of the thesis, we approach the regularities and the meaning shift patterns. Here, we introduce a novel data collection approach for accessing the meaning components, a [...]
doi:10.18419/opus-10113 fatcat:vnkyygmzdjh4ba64rmwaw4e6li