A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
[article]
2022
arXiv
pre-print
Streaming recognition and segmentation of multi-party conversations with overlapping speech is crucial for the next generation of voice assistant applications. In this work we address its challenges discovered in the previous work on multi-turn recurrent neural network transducer (MT-RNN-T) with a novel approach, separator-transducer-segmenter (STS), that enables tighter integration of speech separation, recognition and segmentation in a single model. First, we propose a new segmentation
arXiv:2205.05199v1
fatcat:n7a37hcpf5ddtg7bkjli6e6zaq