Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition

Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma
2019 Interspeech 2019  
Robustness of automatic speech recognition (ASR) systems is a critical issue due to noise and reverberations. Speech enhancement and model adaptation have been studied for long time to address this issue. Recently, the developments of multitask joint-learning scheme that addresses noise reduction and ASR criteria in a unified modeling framework show promising improvements, but the model training highly relies on paired clean-noisy data. To overcome this limit, the generative adversarial
more » ... (GANs) and the adversarial training method are deployed, which have greatly simplified the model training process without the requirements of complex front-end design and paired training data. Despite the fast developments of GANs for computer visions, only regular GANs have been adopted for robust ASR. In this work, we adopt a more advanced cycleconsistency GAN (CycleGAN) to address the training failure problem due to mode collapse of regular GANs. Using deep residual networks (ResNets), we further expand the multi-task scheme to a multi-task multi-network joint-learning scheme for more robust noise reduction and model adaptation. Experiment results on CHiME-4 show that our proposed approach significantly improves the noise robustness of the ASR system by achieving much lower word error rates (WERs) than the stateof-the-art joint-learning approaches.
doi:10.21437/interspeech.2019-2078 dblp:conf/interspeech/ZhaoNTM19 fatcat:3bfy4hfeybgrtfqjo5buk4f43q