Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation

Sashi NOVITASARI, Sakriani SAKTI, Satoshi NAKAMURA
2021 IEICE transactions on information and systems  
Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-tospeech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more
more » ... cated training mechanism than the standard attentionbased ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attentiontransfer ISR (AT-ISR) that learns the knowledge from attention-based nonincremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncoveredword rate.
doi:10.1587/transinf.2021edp7014 fatcat:4xxttvurvncw5lcbuyyjld3vqe