A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition
[article]
2020
arXiv
pre-print
In this paper, we conduct data selection analysis in building an English-Mandarin code-switching (CS) speech recognition (CSSR) system, which is aimed for a real CSSR contest in China. The overall training sets have three subsets, i.e., a code-switching data set, an English (LibriSpeech) and a Mandarin data set respectively. The code-switching data are Mandarin dominated. First of all, it is found using the overall data yields worse results, and hence data selection study is necessary. Then to
arXiv:2006.07094v2
fatcat:g5pql34bozdhxgaj4e76jphyp4