A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
SpEx: Multi-Scale Time Domain Speaker Extraction Network
2020
IEEE/ACM Transactions on Audio Speech and Language Processing
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target speaker's voice from a multi-talker environment. It is common to perform the extraction in frequency-domain, and reconstruct the time-domain signal from the extracted magnitude and estimated phase spectra. However, such an approach is adversely affected by the inherent difficulty of phase estimation. Inspired by Conv-TasNet, we propose a time-domain speaker extraction network (SpEx) that converts the
doi:10.1109/taslp.2020.2987429
fatcat:xlsfk6ulufeb3cmxhbrhicnfza