Do speech recognizers prefer female speakers?

Martine Adda-Decker, Lori Lamel
2005 Interspeech 2005   unpublished
In this contribution we examine large speech corpora of prepared broadcast and spontaneous telephone speech in American English and in French. Starting with the question whether ASR systems behave differently on male and female speech, we then try to find evidence on acoustic-phonetic, lexical and idiomatic levels to explain the observed differences. Recognition results have been analysed on 3-7h of speech in each language and speech type condition (totaling 20 hours). Results consistently show
more » ... a lower word error rate on female speech ranging from 0.7 to 7% depending on the condition. An analysis of automatically produced pronunciations in speech training corpora (totaling 4000 hours of speech) revealed that female speakers tend to stick more consistently to standard pronunciations than male speakers. Concerning speech disfluencies, male speakers show larger proportions of filled pauses and repetitions, as compared to females.
doi:10.21437/interspeech.2005-699 fatcat:vh5jy7hugjbdji4nrstwqcpzjm