Distant Speech Recognition Using a Microphone Array Network

Alberto Yoshihiro NAKANO, Seiichi NAKAGAWA, Kazumasa YAMAMOTO
2010 IEICE transactions on information and systems  
In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking.
more » ... o compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments. key words: distant speech recognition, microphone array network, GMMbased CMN, speaker's position and orientation estimation
doi:10.1587/transinf.e93.d.2451 fatcat:aiyu6yfmpzgvdhxy2qcgtcw5xu