Iterative PLDA Adaptation for Speaker Diarization

Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier
2016 Interspeech 2016  
This paper investigates iterative PLDA adaptation for crossshow speaker diarization applied to small collections of French TV archives based on an i-vector framework. Using the target collection itself for unsupervised adaptation, PLDA parameters are iteratively tuned while score normalization is applied for convergence. Performances are compared, using combinations of target and external data for training and adaptation. The experiments on two distinct target corpora show that the proposed
more » ... ework can gradually improve an existing system trained on external annotated data. Such results indicate that performing speaker diarization on small collections of unlabeled audio archives should only rely on the availability of a sufficient bootstrap system, which can be incrementally adapted to every target collection. The proposed framework also widens the range of acceptable speaker clustering thresholds for a given performance objective. Index Terms: speaker diarization, PLDA, unsupervised training, domain adaptation, iterative training Within-recording speaker diarization The front-end is composed of a MFCC extraction and Viterbibased speech activity detection, followed by a standard BIC segmentation and clustering and i-vector extraction. The BIC penalty coefficient is chosen so that resulting clusters are pure and represent a unique speaker. Each cluster is normalized with zero mean and unit variance and an i-vector is extracted. Spherical Nuisance Normalization (SNN) [15] is applied on the whole i-vector dataset. Afterwards, PLDA is used to calculate log likelihood ratios (LLR) for all pairs of i-vectors [16] . The opposite of the resulting LLR matrix is called PLDA score matrix in the following of the paper. For each recording, a PLDA score matrix is computed
doi:10.21437/interspeech.2016-572 dblp:conf/interspeech/LanCLM16 fatcat:55sdeleh2jasbbrrs3u5jhinjy