Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment

Sami Keronen, Heikki Kallasjoki, Ulpu Remes, Guy J. Brown, Jort F. Gemmeke, Kalle J. Palomäki
2013 Computer Speech and Language  
We present an automatic speech recognition system that uses a missing data approach to compensate for challenging environmental noise containing both additive and convolutive components. The unreliable and noisecorrupted ("missing") components are identified using a Gaussian mixture model (GMM) classifier based on a diverse range of acoustic features. To perform speech recognition using the partially observed data, the missing components are substituted with clean speech estimates computed
more » ... both sparse imputation and cluster-based GMM imputation. Compared to two reference mask estimation techniques based on interaural level and time difference-pairs, the proposed missing data approach significantly improved the keyword accuracy rates in all signal-to-noise ratio conditions when evaluated on the CHiME reverberant multisource environment corpus. Of the imputation methods, cluster-based imputation was found to outperform sparse imputation. The highest keyword accuracy was achieved when the system was trained on imputed data, which made it more robust to possible imputation errors.
doi:10.1016/j.csl.2012.06.005 fatcat:x7uuskic5bgqzkj4ixoml6u27e