Word-Level Embeddings for Cross-Task Transfer Learning in Speech Processing

Pierre Beckmann, Mikolaj Kegler, Milos Cernak
2021 2021 29th European Signal Processing Conference (EUSIPCO)   unpublished
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech recognition. Up to date, most of these approaches are task-specific and designed for within-task transfer learning between different datasets or setups of a particular task. In turn, learning taskindependent representation of speech and cross-task applications
more » ... f transfer learning remain less common. Here, we introduce an encoder capturing word-level representations of speech for crosstask transfer learning. We demonstrate the application of the pre-trained encoder in four distinct speech and audio processing tasks: (i) speech enhancement, (ii) language identification, (iii) speech, noise, and music classification, and (iv) speaker identification. In each task, we compare the performance of our cross-task transfer learning approach to task-specific baselines. Our results show that the speech representation captured by the encoder through the pre-training is transferable across distinct speech processing tasks and datasets. Notably, even simple applications of our pre-trained encoder outperformed task-specific methods, or were comparable, depending on the task.
doi:10.23919/eusipco54536.2021.9616254 fatcat:ioqudb3t7vfirg33jinjg6qoju