Improved Syllable Based Acoustic Modeling by Inter-Syllable Transition Model for Continuous Chinese Speech Recognition

Hao Chao, Wenju Liu
2009 2009 Chinese Conference on Pattern Recognition  
Accurately modeling the acoustic variabilities caused by coarticulation is important in continuous speech recognition. Recent research indicates that syllable units do better in modeling intra-syllable co-articulation effect than sub-syllable units. However, most continuous Mandarin speech recognition systems use context dependent phones or Initial/Finals (IFs) as the basic acoustic unit because it is difficult to collect sufficient data to train longer units. Here we present a syllable based
more » ... proach which includes two steps. Firstly, context independent syllable based acoustic models are trained, and the models are initialized by intra-syllable IFs based diphones to solve the problem of training data sparsity. Secondly, we capture the inter-syllable co-articulation effect by incorporating inter-syllable transition models into the recognition system. Experiment results show that the acoustic model based on the presented approach is effective in improving the recognition performance. Index Terms: speech recognition, modeling unit selection, coarticulation 978-1-4244-4199-0/09/$25.00 ©2009 IEEE Authorized licensed use limited to: INSTITUTE OF AUTOMATION CAS. Downloaded on December 7, 2009 at 05:11 from IEEE Xplore. Restrictions apply.
doi:10.1109/ccpr.2009.5344019 fatcat:td5f4detl5g3feichtot2gs5xa