Nonparametric Uncertainty Estimation and Propagation for Noise Robust ASR

Dung T. Tran, Emmanuel Vincent, Denis Jouvet
2015 IEEE/ACM Transactions on Audio Speech and Language Processing  
We consider the framework of uncertainty propagation for automatic speech recognition (ASR) in highly nonstationary noise environments. Uncertainty is considered as the variance of speech distortion. Yet, its accurate estimation in the spectral domain and its propagation to the feature domain remain difficult. Existing methods typically rely on a single uncertainty estimator and propagator fixed by mathematical approximation. In this paper, we propose a new paradigm where we seek to learn more
more » ... owerful mappings to predict uncertainty from data. We investigate two such possible mappings: linear fusion of multiple uncertainty estimators/propagators and nonparametric uncertainty estimation/propagation. In addition, a procedure to propagate the estimated spectral-domain uncertainty to the static Mel frequency cepstral coefficients (MFCCs), to the log-energy, and to their first-and second-order time derivatives is proposed. This results in a full uncertainty covariance matrix over both static and dynamic MFCCs. Experimental evaluation on Tracks 1 and 2 of the 2nd CHiME Challenge resulted in up to 29% and 28% relative keyword error rate reduction with respect to speech enhancement alone.
doi:10.1109/taslp.2015.2450497 fatcat:gv2ypw3ykbeszbgja6b3yd5qsa