Development of a spontaneous large vocabulary speech recognition system for Qatari Arabic

Mohamed Elmahdy
2013 Qatar Foundation Annual Research Forum Proceedings  
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. The Qatari Arabic (QA) dialect has been chosen as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari
more » ... V series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and system combination. The proposed approach can achieve more than 28% relative reduction in WER.
doi:10.5339/qfarf.2013.ictp-053 fatcat:rhpntl44bngdfg3vwdegy4qbwq