NaDiR: Naive Distributional Response Generation

Gabriella Lapesa, Stefan Evert
2014 Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)  
This paper describes NaDiR (Naive DIstributional Response generation), a corpus-based system that, from a set of word stimuli as an input, generates a response word relying on association strength and distributional similarity. NaDiR participated in the CogALex 2014 shared task on multiword associations (restricted systems track), operationalizing the task as a ranking problem: candidate words from a large vocabulary are ranked by their average association or similarity to a given set of
more » ... . We also report on a number of experiments conducted on the shared task data, comparing first-order models (based on co-occurrence and statistical association) to second-order models (based on distributional similarity). The Task and its Problems The shared task datasets are derived from the Edinburgh Associative Thesaurus (Kiss et al., 1973) 1 . The Edinburgh Associative Thesaurus (henceforth, EAT) contains free associations to approximately 8000 English cue words. For each cue (e.g., visual) EAT lists all associations collected in the survey (e.g., aid, eyes, aids, see, eye, seen, sight, etc.) sorted according to the number of subjects who responded with the respective word. The CogALex shared task on multiword association is based on the EAT dataset, and is in fact a reverse association task (Rapp, 2014) . The top five responses for a target word are provided as stimuli (e.g., aid, eyes, aids, see, eye), and the participating systems are required to generate the original cue as a response (e.g., visual). The training and the test sets are random extracts of 2000 EAT This work is licenced under a Creative Commons Attribution 4.0 International License.
doi:10.3115/v1/w14-4707 dblp:conf/cogalex/LapesaE14 fatcat:kyj7r7cty5cq7kov53blesibfq