Unsupervised Indexing of Conversations with Short Speaker Utterances

Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt
2007 2007 IEEE Aerospace Conference  
Two speaker indexing system for conversations are presented in this paper. The first method involves indexing two-speaker conversations. In this method, two reference models are judiciously chosen from the conversation such that they represent the two different speakers. Models are then matched to the reference speakers using distance-based comparisons. The second technique is based on first determining the number of participants in the conversation using a speaker count method termed the
more » ... ual Ratio Algorithm" (RRA), and then indexing based on this count. The RRA involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation and the relative amount of residual speech is observed to determine the count. The distance measures used in comparing models include the Bhattacharya distance, the T-Square statistics and the Mahalanobis distance. Speaker comparison decisions of all three distances are combined to improve the accuracy of the system. Linear Predictive Cepstral Coefficients of voiced phonemes are used in forming speaker models. The twospeaker indexing technique was able to yield an indexing accuracy of up to 95% when evaluated using SWITCHBOARD data. The counting-indexing technique resulted in a maximum indexing accuracy of about 91% when tested on artificial conversations generated from HTIMIT data.
doi:10.1109/aero.2007.352977 fatcat:mb4gol7fwrg5zbxjkfotqqlbii