Landmark-based consonant voicing detection on multilingual corpora

Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel
2017 Journal of the Acoustical Society of America  
This paper tests the hypothesis that distinctive feature classifiers anchored at phonetic landmarks can be transferred cross-lingually without loss of accuracy. Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN). All detectors are trained on English data (TIMIT),and
more » ... on English, Turkish, and Spanish (performance measured using F1 and accuracy). Experiments demonstrate that manual features outperform all MFCC classifiers, while CNNfeatures outperform both. MFCC-based classifiers suffer an F1reduction of 16% absolute when generalized from English to other languages. Manual features suffer only a 5% F1 reduction,and CNN features actually perform better in Turkish and Span-ish than in the training language, demonstrating that features capable of representing long-term spectral dynamics (CNN and landmark-based features) are able to generalize cross-lingually with little or no loss of accuracy
doi:10.1121/1.4987203 fatcat:mxqoowa55bfzlf65qsblei5ala