Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition

Antti Hurmalainen, Tuomas Virtanen
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Non-negative spectral factorisation has been used successfully for separation of speech and noise in automatic speech recognition, both in feature-enhancing front-ends and in direct classification. In this work, we propose employing spectro-temporal 2D filters to model dynamic properties of Mel-scale spectrogram patterns in addition to static magnitude features. The results are evaluated using an exemplar-based sparse classifier on the CHiME noisy speech database. After optimisation of static
more » ... isation of static features and modelling of temporal dynamics with derivative features, we achieve 87.4% average score over SNRs from 9 to -6 dB, reducing the word error rate by 28.1% from our previous static-only features.
doi:10.1109/icassp.2012.6288823 dblp:conf/icassp/HurmalainenV12 fatcat:k5chrxkgkvdubotnpu72vzarsm