Robust Recognition of Binaural Speech Signals Using Techniques Based on Human Auditory Processing

Anjali Menon
Automatic Speech Recognition (ASR) engines are extremely susceptible to noise. There is an increasing prevalence of voice-assisted devices which need to recognize speech accurately in a variety of complex listening environments. These include the presence ofbackground noise, reverberation, and multiple talkers.The human auditory system, on the other hand, is very good at understanding speech even in extremely challenging environments. It might therefore, be useful to use our knowledge of human
more » ... knowledge of human hearing to develop techniques that lead to robust speech recognition. This entails applying techniques that have their basis in human auditory processing towards automatic speech recognition (ASR).In this thesis, we discuss a number of techniques that address the problem of robust recognition of binaural signals in the presence of reverberation and multiple talkers sincethey pose a significant problem in terms of ASR engine performance. The techniques discussed here roughly follow the manner in which the auditory system achieves noiserobustness. The fundamental idea behind all the techniques proposed is that sounds emanating from the same sound source exhibit some degree of coherence. We aim to use this property to achieve better isolation of the target signal leading to better speech recognitionaccuracy. Three techniques are proposed. The Interaural Cross-correlation-basedWeighting (ICW) algorithm looks for coherence across sensors using signal envelopes in order to isolate signals coming from the same location. To reduce the effect of reverberation, steady-state suppression is applied as an initial step. The ICW algorithm combined with steady-state suppression leads to significant improvements in ASR accuracy. The Coherence-to-Diffuse Ratio-based Weighting (CDRW) algorithm uses a model-based technique to evaluate the ratio of coherent energy to diffuse energy in a given signal. This leads to significantly better performance in ASR. The third technique is the Cross-Correlation across Frequency (CCF) [...]
doi:10.1184/r1/7813730.v1 fatcat:zv6nrchyrvdubpwelyvmuqgc7q