Design and Evaluation of Metaphor Processing Systems
System design and evaluation methodologies receive significant attention in natural language processing (NLP), with the systems typically being evaluated on a common task and against shared datasets. This enables direct system comparison and facilitates progress in the field. However, computational work on metaphor is considerably more fragmented than similar research efforts in other areas of NLP and semantics. Recent years have seen a growing interest in computational modelling of metaphor,
... th many new statistical techniques opening routes for improving system accuracy and robustness. However, the lack of a common task definition, shared dataset and evaluation strategy makes the methods hard to compare, and thus hampers our progress as a community in this area. The goal of this article is to review the system features and evaluation strategies that have been proposed for the metaphor processing task, and analyse their benefits and downsides, with the aim of identifying the desired properties of metaphor processing systems and a set of requirements to their evaluation. Computational Linguistics Volume xx, Number xx (1) "President Obama is rebuilding the campaign machinery that vaulted him into office" (New York Times, 2011) (2) 20 steps towards a modern, working democracy (3) Time to mend our foreign policy. (4) "She knows the nuts and bolts, and it's not the nuts and bolts inside legislation, it's the nuts and bolts of raising money, preparing the party for elections, a political consultant kind of politics." (Bzdek 2008) These examples demonstrate how multiple properties and inferences from the domain of mechanisms are systematically projected onto our knowledge about politics. Lakoff and Johnson coined the term conceptual metaphor to describe such mappings from the source domain to the target. The view of an inter-conceptual mapping as a basis of metaphor was echoed by other prominent theories in the field. These include, most notably, the comparison view, formulated in the Structure-Mapping Theory of Gentner (1983), and the interaction view (Black 1962; Hesse 1966) . However, it is the principles of CMT that inspired and influenced much of the computational work on metaphor, thus becoming more central to this paper. Conceptual metaphor manifests itself in language in the form of linguistic metaphor, or metaphorical expressions. These in turn include lexical metaphor, i.e. single-word meaning extensions (as in the examples (3) and (2) ), multi-word metaphorical expressions (e.g. "the government turned a blind eye to corruption") or extended metaphor, that spans longer discourse fragments. Manifestations of metaphor are frequent in language, appearing on average in every third sentence of general-domain text, according to corpus studies (Cameron 2003; Martin 2006; Steen et al. 2010; Shutova and Teufel 2010) . This makes metaphor an important subject of linguistic research and its accurate processing essential for a range of practical NLP applications. These include, for example, (1) machine translation (MT): since a large number of metaphorical expressions are culture-specific, they represent a considerable challenge for MT (e.g. the English metaphor "to shoot down someone's arguments" cannot be literally translated into German as "Argumente abschießen" and metaphor interpretation is required); (2) opinion mining: metaphorical expressions tend to contain a strong emotional component, e.g. compare the metaphorical expression "Government loosened stranglehold on business" and its literal counterpart "Government deregulated business" (Narayanan 1999); (3) information retrieval (IR): non-literal language without appropriate disambiguation may lead to false positives in information retrieval (e.g. documents describing "old school gentlemen" should not be returned for the query "school" (Korkontzelos et al. 2013)); and many others. Since metaphor interpretation requires complex analogical comparisons and projecting inference structures across domains, the task of automatic metaphor processing is challenging. For many years, computational work on metaphor evolved around the use of hand-coded knowledge and rules to model metaphorical associations, making the systems hard to scale. Recent years have seen a growing interest in statistical modelling of metaphor (many new techniques opening routes for improving system accuracy and robustness. A wide range of methods have been proposed and investigated by the community, including supervised (Gedigian et 2 Shutova ). While individual approaches tackling individual aspects of metaphor have met with success, the insights gained from these experiments are still difficult to integrate into a single computational metaphor modelling landscape, due to the lack of a unified task definition, shared dataset and well-defined evaluation standards. This hampers our progress as a community in this area. In this paper we take a step towards closing this gap: we review the recent work on computational modelling of metaphor, the tasks addressed, the system features proposed and the evaluations conducted, and analyse the relevance of different linguistic aspects of metaphor for system performance and applicability, with the aim of identifying the desired properties of metaphor processing systems and a set of requirements to their evaluation. Considerations in the Design of a Metaphor Processing System When designing a metaphor processing system one faces a number of choices. Some stem from the linguistic and cognitive properties of metaphor, others concern the applicability and usefulness of the system in wider NLP context. In this section, we analyse individual aspects of metaphor and their relevance to computational modelling, as well as their interplay in the design of a real-world system. Linguistic considerations and levels of analysis Linguistic considerations that inform the design of metaphor processing systems concern primarily the choice of the level (or levels) of analysis. The levels of metaphor analysis include (1) linguistic metaphor (or metaphorical expressions), (2) conceptual metaphor, (3) extended metaphor and (4) metaphorical inference. Let us consider an example of manifestations of the conceptual metaphor EUROPEAN INTEGRATION as a TRAIN JOURNEY, popular in the early nineties, at various levels.