Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge

Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
2018 Interspeech 2018  
This paper focuses on the estimation of the number of speakers for diarization in the context of the DIHARD Challenge at In-terSpeech 2018. This evaluation seeks the improvement of the diarization task in challenging corpora (Youtube videos, meetings, court audios, etc), containing an undetermined number of speakers with different relevance in terms of speech contributions. Our proposal for the challenge is a system based on the ivector PLDA paradigm: Given some initial segmentation of the
more » ... audio we extract i-vector representations for each acoustic fragment. These i-vectors are clustered with a Fully Bayesian PLDA. This model, a generative model with latent variables as speaker labels, produces the diarization labels by means of Variational Bayes iterations. The number of speakers is decided by comparing multiple hypotheses according to different information criteria. These criteria are developed around the Evidence Lower Bound (ELBO) provided by our PLDA.
doi:10.21437/interspeech.2018-1841 dblp:conf/interspeech/VinalsGOML18 fatcat:465yndbim5ekhgpkuujfmxqnwq