An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework

Jun Du, Qing Wang, Yan-Hui Tu, Xiao Bao, Li-Rong Dai, Chin-Hui Lee
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
We present an information fusion approach to robust recognition of microphone array speech for the recently launched 3rd CHiME Challenge. It is based on a deep learning framework with a large neural network consisting of subnets with different architectures. Multiple knowledge sources are integrated via an early fusion of normalized noisy features with different beamforming techniques, speech enhanced features, speaker related features, and other auxiliary features concatenated as the input to
more » ... ach subnet, and a late fusion by combining the outputs of all subnets to produce one single output set. Our experiments demonstrate that all information sources are complementary in our proposed framework. Our best system achieves an average word error rate reduction of 68% from the officially released baseline results on the test set of real data. Index Terms-CHiME Challenge, deep learning, information fusion, microphone array, robust speech recognition 978-1-4799-7291-3/15/$31.00
doi:10.1109/asru.2015.7404827 dblp:conf/asru/DuWTBDL15 fatcat:vjiezngqrrdqtnbzssymawsame