A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
This paper investigates replacing i-vectors for text-independent speaker verification with embeddings extracted from a feedforward deep neural network. Long-term speaker characteristics are captured in the network by a temporal pooling layer that aggregates over the input speech. This enables the network to be trained to discriminate between speakers from variablelength speech segments. After training, utterances are mapped directly to fixed-dimensional speaker embeddings and pairs ofdoi:10.21437/interspeech.2017-620 fatcat:i3atblwfivedbmurqfgo37b4te