Improved Language-Independent Speaker Identification in a Non-contemporaneous Setup

Smarajit Bose, Amita Pal, Anish Mukherjee, Debasmita Das
2020 International Journal of Machine Learning and Computing  
One of the most effective approaches available in the literature for Automatic Speaker Identification is based on Gaussian Mixture Models (GMMs) with Mel Frequency Cepstral Coefficients (MFCCs) as features (Reynolds (1995). The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and the capability of mixtures to model arbitrary densities. In an earlier work, the authors have presented
more » ... and demonstrated empirically (using the benchmark speech corpus NTIMIT) how combining two different well-known set of features (MFCCs and Perceptual Linear Predictive Coefficients (PLPCs)) and using ensemble classifiers in conjunction with the Principal Component Transformation (PCT) and some robust statistical estimation techniques, enhances significantly the performance of the baseline MFCC-GMM speaker recognition system. In this work, the authors demonstrate that this approach, besides being statistically robust, is also significantly more robust than the baseline system to language mismatch in a non-contemporaneous setup. This has been done with the help of ISIS/NISIS, a bilingual dual-channel speech corpus with multi-session speech recordings. Index Terms-Mel frequency cepstral coefficients, perceptual linear predictive coefficients, Gaussian mixture models, ensemble classifiers, classification accuracy, trimmed mean.
doi:10.18178/ijmlc.2020.10.5.984 fatcat:rowxvwp2lnavzgsu6qj7gzjmke