Robust Multi-stream Keyword and Non-linguistic Vocalization Detection for Computationally Intelligent Virtual Agents [chapter]

Martin Wöllmer, Erik Marchi, Stefano Squartini, Björn Schuller
2011 Lecture Notes in Computer Science  
Systems for keyword and non-linguistic vocalization detection in conversational agent applications need to be robust with respect to background noise and different speaking styles. Focussing on the Sensitive Artificial Listener (SAL) scenario which involves spontaneous, emotionally colored speech, this paper proposes a multi-stream model that applies the principle of Long Short-Term Memory to generate contextsensitive phoneme predictions which can be used for keyword detection. Further, we
more » ... tigate the incorporation of noisy training material in order to create noise robust acoustic models. We show that both strategies can improve recognition performance when evaluated on spontaneous human-machine conversations as contained in the SEMAINE database.
doi:10.1007/978-3-642-21090-7_58 fatcat:vhzpykjkmfhvtg7vnkattjry6e