Spoken language identification using the speechdat corpus

Diamantino Caseiro, Isabel M. Trancoso
1998 5th International Conference on Spoken Language Processing (ICSLP 1998)   unpublished
Current language identification systems vary significantly in their complexity. The systems that use higher level linguistic information have the best performance. Nevertheless, that information is hard to collect for each new language. The system presented in this paper is easily extendable to new languages because it uses very little linguistic information. In fact, the presented system needs only one language specific phone recogniser (in our case the Portuguese one), and is trained with
more » ... ch from each of the other languages. With the SpeechDat-M corpus, with 6 European languages (English, French, German, Italian, Portuguese and Spanish) our system achieved an identification rate of 83.4% on 5-second utterances, this result shows an improvement of 5% over our previous version, mainly through the use of a neural network classifier. Both the baseline and the full system were implemented in realtime.
doi:10.21437/icslp.1998-256 fatcat:7iowizc43nbj7dn3qkd55hgxxi