Codebook dependent dynamic channel estimation for Mandarin speech recognition over telephone

Huayun Zhang, Zhaobing Han, Bo Xu
2002 7th International Conference on Spoken Language Processing (ICSLP 2002)   unpublished
Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the variational mismatch caused by different telephone channels between the testing and training sets. In this paper, we propose an efficient implementation
more » ... o dynamically compensate this mismatch. This algorithm bases on maximum-likelihood (ML) estimation of telephone channels and dynamically follows the time-variations within the channels. It could deal with both linear channels' (like fixed telephone lines) degradation and some noisy nonlinear channels' (like some long distance lines and wireless circuit lines, such as GSM) degradation. In our experiments on Mandarin large vocabulary continuous speech recognition (LVCSR) over telephone lines, the average character error rate (CER) decreases more than 20% when applying this algorithm. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 300~400ms. So it could be embedded into practical telephone-based applications.
doi:10.21437/icslp.2002-599 fatcat:2f4vvh7wdnewdaxxg7bbckhmsy