Language coverage for mismatched crowdsourcing

Lav R. Varshney, Preethi Jyothi, Mark Hasegawa-Johnson
2016 2016 Information Theory and Applications Workshop (ITA)  
Developing automatic speech recognition technologies requires transcribed speech so as to learn the mapping from sound to text. It is traditionally assumed that transcribers need to be native speakers of the language being transcribed. Mismatched crowdsourcing is the transcription of speech by crowd workers who do not speak the language. Given there are phonological similarities among different human languages, mismatched crowdsourcing does provide noisy data that can be aggregated to yield
more » ... egated to yield reliable labels. Here we discuss phonological properties of different languages in a coding-theoretic framework, and how nonnative phoneme misperception can be modeled as a noisy communication channel. We show the results of experiments demonstrating the efficacy of this information theory inspired modeling approach, having native English speakers and native Mandarin speakers transcribe Cantonese speech. Finally we discuss how crowd workers whose native language background give them the highest probability of faithful transcription can be found by solving a weighted set cover problem.
doi:10.1109/ita.2016.7888198 dblp:conf/ita/VarshneyJH16 fatcat:b6rdub6ktfcqza2jrd6vshizti