SLEDDED: A Proposed Dataset of Event Descriptions for Evaluating Phrase Representations

Laura Rimell, Eva Maria Vecchi
2016 Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP  
Measuring the semantic relatedness of phrase pairs is important for evaluating compositional distributional semantic representations. Many existing phrase relatedness datasets are limited to either lexical or syntactic alternations between phrase pairs, which limits the power of the evaluation. We propose SLEDDED (Syntactically and LExically Divergent Dataset of Event Descriptions), a dataset of event descriptions in which related phrase pairs are designed to exhibit minimal lexical and
more » ... c overlap; for example, a decisive victory -won the match clearly. We also propose a subset of the data aimed at distinguishing event descriptions from related but dissimilar phrases; for example, vowing to fight to the death -a new training regime for soldiers, which serves as a proxy for the tasks of narrative generation, event sequencing, and summarization. We describe a method for extracting candidate pairs from a corpus based on occurrences of event nouns (e.g. war) and a two-step annotation process consisting of expert annotation followed by crowdsourcing. We present examples from a pilot of the expert annotation step.
doi:10.18653/v1/w16-2525 dblp:conf/repeval/RimellV16 fatcat:kg35frkii5c3nfrqxdm3zbvze4