An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification

Chunlei Zhang, Shivesh Ranjan, John Hansen
2018 Odyssey 2018 The Speaker and Language Recognition Workshop  
In this paper, we present transfer learning for deep neural network based text-independent speaker verification, in the presence of a severe mismatch between the enrollment and the test data. Given a pre-trained speaker embedding network developed with out-of-domain data, we explore and analyze how this pre-trained model can benefit for the in-domain speaker verification task. Two alternative strategies are investigated to perform transfer learning, i.e., vanilla transfer learning (V-TL) and
more » ... riculum learning based transfer learning (CL-TL). The proposed methods are validated on UT-SCOPE-physical speech corpus, where we create a setup to introduce mismatched evaluation conditions with the neutral and the physical task stressed speech. Experimental results confirm the effectiveness of both V-TL and CL-TL techniques. Employing transfer learning based on the pre-trained model, we are able to achieve a +47.7% relative improvement over a conventional i-vector/PLDA system and a +30.6% relative improvement over a recent proposed end-to-end system, respectively.
doi:10.21437/odyssey.2018-26 dblp:conf/odyssey/ZhangRH18 fatcat:jo7mzr6qkfejjb6b5qmq2sdukq