Factor Analysis Based Speaker Verification Using ASR

Hang Su, Steven Wegmann
2016 Interspeech 2016  
In this paper, we propose to improve speaker verification performance by importing better posterior statistics from acoustic models trained for Automatic Speech Recognition (ASR). This approach aims to introduce state-of-the-art techniques in ASR to speaker verification task. We compare statistics collected from several ASR systems, and show that those collected from deep neural networks (DNN) trained with fMLLR features can effectively reduce equal error rate (EER) by more than 30% on NIST SRE
more » ... 2010 task, compared with those DNN trained without feature transformations. We also present derivation of factor analysis using variational Bayes inference, and illustrate implementation details of factor analysis and probabilistic linear discriminant analysis (PLDA) in Kaldi recognition toolkit. A general speaker verification pipeline is shown in Figure 1 . Thanks to the scheme proposed in [7], one could use separate feature streams for frame posterior estimation and speaker ID front-end. The focus of this work is comparing ASR acoustic models for frame posterior generation.
doi:10.21437/interspeech.2016-1157 dblp:conf/interspeech/SuW16 fatcat:evvbzauie5bffbvoo5abg2xtfe