Spoken language understanding

Ye-Yi Wang, Li Deng, A. Acero
2005 IEEE Signal Processing Magazine  
B esides dictation, there are many other practical applications for speech recognition, including command and control, spoken dialog systems [1], [2] , speech-to-speech translation [3] , and multimodal interaction [4]- [6] . The success of these applications relies on the correct recognition not only of what is said but also of what is meant. In contrast to automatic speech recognition (ASR), which converts a speaker's spoken utterance into a text string, spoken language understanding (SLU) is
more » ... imed at interpreting user's intentions from their speech utterances. Traditionally, this has been accomplished by writing context-free grammars (CFGs) or unification grammars (UGs) by hand. The manual grammar authoring process is laborious and expensive, requiring much expertise. In recent years, many data-driven models have been proposed for this problem. The main purpose of this article is to provide an introduction to the statistical framework common in SLU, which has not been widely revealed to signal processing readers in the past. SLU is closely related to natural language understanding (NLU), a field that has been studied for half a century. However, the problem of SLU has its own characteristics. Unlike general-domain NLU, SLU (in the current state of technology) focuses only on specific application domains. Hence, many domain-specific constraints can be included in the understanding model. Ostensibly, this may make the problem easier to solve. Unfortunately, spoken language is much noisier than written language. The inputs to an SLU system are not as well formed as those to an NLU system. They often do not comply with rigid syntactic constraints. Disfluencies such as false starts, repairs, and hesitations are pervasive, especially in conversational speech, and errors made by speech recognizers are inevitable. Therefore, robustness is one of the most important issues in SLU. On the other hand, a robust solution tends to over-generalize and introduce ambiguities, leading to reduction of understanding accuracy. A major challenge to SLU is to strike
doi:10.1109/msp.2005.1511821 fatcat:z3ubnpeocrf7nmpn2ftxtfmism