Missing Feature Theory based Interface Between Sound Source Separation and Automatic Speech Recognition and Applying to Multiple Robots

Shunichi Yamamoto, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi G. Okuno
2005 Journal of the Robotics Society of Japan  
Robot audition is a critical technology in creating an intelligent robot operating in daily environments. To realize such a robot audition system, we have designed a missing feature theory based interface between sound source separation and automatic speech recognition (ASR). In this interface, features distorted by speech separation are detected from input speech as missing features. The detected missing features are masked on recognition to avoid severe deterioration of recognition
more » ... . By using the interface, we developed the robot audition system which recognizes multiple simultaneous speech. We also assess its general applicability by implementing it on three different humanoids, i.e., Honda ASIMO, SIG2, and Replie of Kyoto University. By using three simultaneous speeches as benchmarks, its general applicability was confirmed. When triphone is used and a size of vocabulary is 200 words, the average word correct of three simultaneous speech are 79.7%, 78.7%, and 82.7% for ASIMO, SIG2, and Replie, respectively.
doi:10.7210/jrsj.23.743 fatcat:5orq7scbxnekdge6zjnm7ihyia