Fishing in a Speech Stream - Angling for a Lexicon

Peter Juel Henrichsen
2011 Nordic Conference of Computational Linguistics  
We present a learning device able to deduce a set of Danish color and shape terms. Only two data sources are available to the learner: A phonetic transcription of a human informant solving a description task, and a minimal formal model of the picture being described. The system thus contains no preconceived lexical, morphological, or semantic categories. The test data are from the phonetic corpus DanPASS, a standard Danish reference corpus. The learning device, called InShape-2, is an early
more » ... lt of an ambitious research programme at CMOL on data-driven language learning.
dblp:conf/nodalida/Henrichsen11 fatcat:nkv4xxom4zdnro3dd2sus6kfu4