End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning

Pavel Denisov, Ngoc Thang Vu
2019 Interspeech 2019  
This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech. We propose to train an end-to-end system conditioned on speaker embeddings and further improved by transfer learning from clean speech. This proposed framework does not require any parallel non-overlapped speech materials and is independent of the number of speakers. Our experimental results on overlapped speech datasets show that joint conditioning on speaker embeddings and
more » ... fer learning significantly improves the ASR performance.
doi:10.21437/interspeech.2019-1130 dblp:conf/interspeech/DenisovV19 fatcat:erd5qsf4ifegnmmwvjcn5il3fm