Automatic phonetic-unit selection and modelling techniques for forensic voice comparison
[thesis]
Chee Cheun Huang
2013
Acoustic phonetic approaches to Forensic Voice Comparison (FVC) have traditionally involved a labor-intensive process of identifying speaker-discriminative phonetic units in speech for speaker characterization in forensic court cases. Automatic FVC systems have employed Gaussian Mixture Model Universal Background Model (GMM-UBM) modelling without explicitly accounting for phonetic unit selection, and with little research into the supervector-based techniques prevalent in recent speaker
more »
... on studies. The goal of this thesis is therefore to improve the efficiency and performance of FVC systems by investigating automatic techniques for detecting speaker-discriminative speech segments, and new modelling techniques that complement conventional FVC systems. A study of Hidden-Markov-Model (HMM) based automatic phonetic segmentation on GMM-UBM FVC systems demonstrates that nasals and vowels contribute the most in system improvements. An investigation of phone recognizer endpoint accuracy demonstrates a trade-off between validity and reliability as a function of the duration of recognized tokens. A novel hybrid HMM/GMM-based automatic phonetic selector was proposed with better phonetic-unit detection accuracy (reduced miss rate) than a conventional HMM-based automatic selector. Substantial FVC improvement was observed from fusion of a system based on manually selected tokens with a baseline system based on all speech-active segments, across various database conditions with approximately 50% human effort reduction in manual token selection, by incorporating the automatic phonetic selector designed with near zero miss rate. A novel adaptation and fusion strategy, termed Separate MAP (SMAP) adaptation, was proposed for GMM-UBM modelling that yielded substantial FVC improvements and was more robust under limited data conditions in comparison with conventional mean-only or full MAP adaptation. The strategy involves fusing multiple MAP adapted sub-configurations wherein smaller subsets of GMM parameters are MAP adapted s [...]
doi:10.26190/unsworks/16537
fatcat:nxlnckxcprfaldtiimkewtoijq