N-Best Re-scoring Approaches for Mandarin Speech Recognition

Xinxin Li, Xuan Wang, Jian Guan
2014 International Journal of Hybrid Information Technology  
The predominant language model for speech recognition is n-gram language model, which is locally learned and usually lacks global linguistic information such as long-distance syntactic constraints. We first explore two n-best re-scoring approaches for Mandarin speech recognition to overcome this problem. The first approach is linear re-scoring that can combine several language models from various perspectives. The weights of these models are optimized using minimum error rate learning method.
more » ... scriminative approach can also be used for re-scoring with rich syntactic features. To overcome the speech text insufficiency problem for discriminative model, we propose a domain adaptation method that trains the model using Chinese pinyin-to-character conversion dataset. Then we present a cascaded approach to combine the two re-scoring models in pipeline that takes the probability output of linear re-scoring model as the initial weight of the discriminative model. Experimental results show that both re-scoring approaches outperform the baseline system, and the cascaded approach achieves the best performance. utilize various information from different sources, including word sequences, part-of-speech tags, syntactic structures [9] . In this paper, we first explore two n-best re-scoring approaches for Mandarin speech recognition. Both re-scoring methods are used to choose the optimal word sequence from nbest lists. Linear re-scoring approach can combine multiple language models from different perspectives through a linear function. These sub-models include character models, pinyinrelated models, part-of-speech models, dependency model. The discriminative re-scoring approach utilizes rich global features from dependency structures instead of context-free features in previous work [9] . However, training text for acoustic model might be insufficient and inappropriate for the discriminative model. We introduce a domain adaptation method that trains the discriminative model from Chinese pinyin-to-character conversion (PTC) dataset. The PTC corpus is adequate because we can generate the data automatically from raw text. Both re-scoring approaches are evaluated on Chinese 863 speech recognition corpus. Then, a cascaded approach is proposed to combine both models, which takes the probability output of linear re-scoring model as the initial weight of discriminative model. Experimental results show that both re-scoring approaches outperform the baseline system, and the cascaded approach can further improve the performance. We then compare linear re-scoring approach with baseline system on experimental group 2,3,4, where the training and development data are totally different. Sub models are chosen the same as the first experiment. The results are shown in Table 6 . The improvement for experimental group 2,3,4 is consistent with group 1. The three experiments show that the linear re-scoring approach also outperforms the baseline system when the training data and development data are different. The linear -scoring approach requires smaller training data and more generalized than discriminative rescoring approach, which will be exhibited on next subsection.
doi:10.14257/ijhit.2014.7.2.26 fatcat:zotvkrfyhfaobcsggblms5kvse