Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban

Sarah Flora Samson Juan, Laurent Besacier, Solange Rossato
2014 Workshop on Spoken Language Technologies for Under-resourced Languages  
This paper describes our experiments and results on using a local dominant language in Malaysia (Malay), to bootstrap automatic speech recognition (ASR) for a very under-resourced language: Iban (also spoken in Malaysia on the Borneo Island part). Resources in Iban for building a speech recognition were nonexistent. For this, we tried to take advantage of a language from the same family with several similarities. First, to deal with the pronunciation dictionary, we proposed a bootstrapping
more » ... egy to develop an Iban pronunciation lexicon from a Malay one. A hybrid version, mix of Malay and Iban pronunciations, was also built and evaluated. Following this, we experimented with three Iban ASRs; each depended on either one of the three different pronunciation dictionaries: Malay, Iban or hybrid. Our best results (WER) for Iban ASR (with different lexicon) were as follows: 20.82% (Malay G2P), 21.90% (Iban G2P) and 20.60% (Hybrid G2P). Apart from that, we applied system combination using all of the systems and obtained an improved accuracy of 19.22%.
dblp:conf/sltu/JuanBR14 fatcat:mus4jszk25ekrhak5o3nz4wbsm