The CMTECH Spoken Web Search System for MediaEval 2013

Ciro Gracia, Xavier Anguera, Xavier Binefa
2013 MediaEval Benchmarking Initiative for Multimedia Evaluation  
We present a system for query by example on zero-resources languages. The system compares speech patterns by fusing the contributions of two acoustic models to cover both their spectral characteristics and their temporal evolution. The spectral model uses standard Gaussian mixtures to model classical MFCC features. We introduce phonetic priors in order to bias the unsupervised training of the model. In addition, we extend the standard similarity metric used comparing vector posteriors by
more » ... rating inter cluster distances. To model temporal evolution patterns we use long temporal context models. We combine the information obtained by both models when computing the similarity matrix to allow subsequence-DTW algorithm to find optimal subsequece alignment paths between query and reference data. Resulting alignment paths are locally filtered and globally normalized. Our experiments on Mediaeval data shows that this approach provides state of the art results and significantly improves the single model and the standard metric baseline.
dblp:conf/mediaeval/GraciaAB13 fatcat:727v7e3q7ndeje4rlvtms4vddq