Dealing with acoustic mismatch for training multilingual subspace Gaussian mixture models for speech recognition

Aanchan Mohan, Sina Hamidi Ghalehjegh, Richard C Rose
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The subspace Gaussian mixture model (SGMM) has been recently proposed as an acoustic modeling technique suitable for configuring multilingual speech recognition systems. It is attractive for this purpose since its parametrization allows its "shared" model parameters to be trained with data from multiple languages [1] . In this work, we report on the results of an experimental study carried out with the goal of improving native Spanish language speech recognition performance using an existing
more » ... sing an existing telephone speech corpus of English spoken by speakers of Spanish origin. Compensation for sources of acoustic variability between Spanish and English language data sets was found to be important in obtaining good multilingual ASR performance. We conclude with a discussion about the notion of acoustic similarity between the state dependent parameters of the SGMM, and its possible use in effectively modelling pronunciation variation.
doi:10.1109/icassp.2012.6289016 dblp:conf/icassp/MohanGR12 fatcat:cwpjtve2cjd2hfwfvxoi7qiuhq