Intelligent Call Manager Based On The Integration Of Computer Telephony, Internet And Speech Processing

Chung-Hsien Wu, Yeou-Jiunn Chen, Gwo-Lang Yan
1998 International Conference on Consumer Electronics  
In this paper, an Intelligent Call Manager which integrates the techniques of computer telephony, internet and speech processing is proposed. This system can answer an incoming call, transfer the call to an extension, send a voice mail and call the pager. Using natural speech interface, the call manager serves callers efficiently and courteously. The call manager is composed of two main subsystems, namely keyword spotting subsystem and text-to-speech subsystem. The keyword spotting subsystem
more » ... otting subsystem was evaluated in a test set of 2400 conversational speech utterances from 20 speakers (12 males and 8 females). At 8.5% false rejection, the proposed keyword spotting subsystem resulted in 17.2% false alarm rate. The evaluation results for the text-to-speech (TTS) conversion subsystem indicated that the average correct rate was 95.7% for intelligibility, and that the mean opinion score (MOS) was 3.4 for naturalness. Introduction The rapid growth and development of computer and telephone integration (CTI), or computer telephony represents tremendous potential and change in all areas of computing. Driven by integrating application and technological trends, the computer and telephone integration is incorporating automatic speech recognition and text-to-speech into applications that have traditionally used a touch-tone interface. To realize the benefits of these new technologies, developers need to create applications with more natural user interfaces that provides speech communication. Rapid advances in speech recognition technology have been achieved in recent years. This enables speech recognition systems to migrate from laboratory to actual application. Thus, one application that cellular providers and local service providers now offer is name dialing. This feature enables a user to call a person by simply speaking the name and the developer can modify the application system more efficient. Since a naive caller could not be expected, in general, to know how to speak to machine, the system is designed to be flexible in accepting a wide range of user response and behavior. For example, the users' response may include disfluencies such as hesitation, false starts, and sounds like um's and ah's. As the technology improves, a user friendly speech recognition system is equipped with a keyword spotting capability which allows users the flexibility to give a wide range of response and behavior [1, 2, 3, 4, 5, 6, 7, 8] . This paper presents the fastest growing category of computer telephony applications: intelligent call management. The intelligent call manager proposed in this paper acts as a bridge between PBX switch and computer and adds programmed intelligence to the manner in which incoming calls are managed. Recently, computer speech processing applications have been found in areas ranging from telecommunications to computers and consumer products. By integrating telephone speech keyword spotting, text-to-speech conversion, Internet, voice mail, and pager, the intelligent call manager is designed to enhance and upgrade the key phone switch to an innovative voice processing system. Using natural speech interface, the intelligent call manager serves callers efficiently and friendly. Chinese is a tonal language in which the same phonetic syllable when pronounced in different tones gives quite distinct meanings. Conventionally, there are 408 Mandarin base syllables, regardless of tones, which is composed of 21 INITIAL's and 38 FINAL's [9] . In this paper, a two-stage keyword recognition system is proposed. In the first stage, the conventional Viterbi algorithm is employed to find the scores of the N best keyword candidates and their corresponding subsyllable boundaries. Subsyllable boundaries are then used to extract the FINAL parts of Mandarin syllables, which contain the prosodic information. In the second stage, the prosodic features of these FINAL parts are fed to their corresponding prosodic Hidden Markov Models (HMM's) and anti-prosodic HMM's to output a prosodic verification The intelligent call manager is implemented on a PC platform with a Dialogic D/41ESC card as the telephone interface. The block diagram of the intelligent call manager is shown in Fig. 1 . It is divided into tow main subsystems: keyword spotting subsystem and text-to-speech subsystem. They are descirbed in detail in the following. Keyword Spotting Subsystem The phonetic and prosodic features are extracted from the input speech. HMM's with continuous observation densities are adopted to model the phonetic and prosodic features. The N-best Viterbi algorithm is employed to fine the scores of the syllable lattice and their corresponding subsyllable boundaries. Subsyllable
doi:10.1109/icce.1998.678264 fatcat:ta5pvcuq4jdgvibqeogw6nhqte