Hybrid Speaker-Based Segmentation System Using Model-Level Clustering

Hyoung-Gook Kim, D. Ertelt, T. Sikora
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.  
In this paper, we present a hybrid speaker-based segmentation, which combines metric-based and modelbased techniques. Without a priori information about number of speakers and speaker identities, the speech stream is segmented by three stages: (1) The most likely speaker changes are detected. (2) To group segments of identical speakers, a two-level clustering algorithm using a Bayesian Information Criterion (BIC) and HMM model scores is performed. Every cluster is assumed to contain only one
more » ... contain only one speaker. (3) The speaker models are reestimated from each cluster by HMM. Finally a resegmentation step performs a more refined segmentation using these speaker models. For measuring the performance we compare the segmentation results of the proposed hybrid method versus metric-based segmentation. Results show that the hybrid approach using two-level clustering significantly outperforms direct metric based segmentation.
doi:10.1109/icassp.2005.1415221 dblp:conf/icassp/KimES05 fatcat:zsd7ior2ibe6zojd4aytnsx5tu