A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Decision Making Based on Cohort Scores for Speaker Verification
[article]
2016
arXiv
pre-print
Decision making is an important component in a speaker verification system. ...
For the conventional GMM-UBM architecture, the decision is usually conducted based on the log likelihood ratio of the test utterance against the GMM of the claimed speaker and the UBM. ...
Recently, deep learning has been applied to speaker verification and gained much interest [7] , [8] . Within a speaker verification system, decision making is an important component [9] . ...
arXiv:1609.08419v1
fatcat:negwpzrryvbjjmf4wff5tkg6q4
Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition
[article]
2018
arXiv
pre-print
In this paper, we propose a neural network based compensation scheme(termed as deep discriminant analysis, DDA) for i-vector based speaker recognition, which shares the spirit with LDA. ...
Optimized against softmax loss and center loss at the same time, the proposed method learns a more compact and discriminative embedding space. ...
We term this NN-based compensation method as Deep Discriminant Analysis (DDA), for comparison with LDA or NDA. ...
arXiv:1805.01344v1
fatcat:asptlxvmlvetjbi7v7aac4a4ry
RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification
[article]
2021
arXiv
pre-print
the depthwise separable convolutions with low-rank factorization of weight matrices. ...
The convolutional neural network (CNN) based approaches have shown great success for speaker verification (SV) tasks, where modeling long temporal context and reducing information loss of speaker characteristics ...
From the description, the essential part of a DNN-based SV system is to build an effective deep embedding architecture for extracting discriminative features between different speakers. ...
arXiv:2108.13249v1
fatcat:snrpr5lvlfeldnz525ug4rqfkm
Analysis of Length Normalization in End-to-End Speaker Verification System
[article]
2018
arXiv
pre-print
The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterance-level representations in automatic speaker verification systems. ...
In this paper, we explore how the neural network learns length-normalized deep speaker embeddings in an end-to-end manner. ...
is the output categories, y i is the deep normalized embedding, ci is the corresponding ground truth label, and W and b are the weights and bias for the last layer of the network which acts as a back-end ...
arXiv:1806.03209v2
fatcat:kta6anuhsjcyjjgsjswow2oaim
On Residual CNN in text-dependent speaker verification task
[article]
2017
arXiv
pre-print
Deep learning approaches are still not very common in the speaker verification field. ...
We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. ...
A speaker discriminative approach is the most natural way for speaker verification. [12] describes a DNN for extracting a small speaker footprint which can be used to discriminate between speakers. ...
arXiv:1705.10134v2
fatcat:xql2kkkjtrfrzfutwi6owtqbdy
Analysis of Length Normalization in End-to-End Speaker Verification System
2018
Interspeech 2018
The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterancelevel representations in automatic speaker verification systems. ...
In this paper, we explore how the neural network learns length-normalized deep speaker embeddings in an end-to-end manner. ...
is the output categories, y i is the deep normalized embedding, ci is the corresponding ground truth label, and W and b are the weights and bias for the last layer of the network which acts as a back-end ...
doi:10.21437/interspeech.2018-92
dblp:conf/interspeech/CaiCL18
fatcat:fn754g4sa5gwzej5z662j3s7yq
Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?
[article]
2018
arXiv
pre-print
This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. ...
In this framework, an attention model works as a frame selector that computes an attention weight for each frame-level feature vector, in accord with which an utterancelevel representation is produced ...
deep speaker embedding network, and (3) Applying attention weights to statistics for i-vector extraction. ...
arXiv:1809.09311v1
fatcat:dmoazbooffgmpdde4vunw5i63e
Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification
[article]
2019
arXiv
pre-print
Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. ...
Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. ...
CONCLUSIONS This paper presented a method to train deep embedding based text-independent speaker verification with a new verification loss function-pAUC. ...
arXiv:1911.08077v1
fatcat:3niehrfdgjavfnuemrdrrcraay
Speaker diarization through speaker embeddings
2015
2015 23rd European Signal Processing Conference (EUSIPCO)
This paper proposes to learn a set of high-level feature representations through deep learning, referred to as Speaker Embeddings, for speaker diarization. ...
Although learned through identification, speaker embeddings are shown to be effective for speaker verification in particular to recognize speakers unseen in the training set. ...
In that context, the hidden layers of the Deep Neural Networks (DNN) are learned to extract information relevant for discriminating between speakers. ...
doi:10.1109/eusipco.2015.7362751
dblp:conf/eusipco/RouvierBF15
fatcat:fni33mx5dvg6rl3e36bootwo5y
Speaker Verification Using Deep Neural Networks: A Review
2019
International Journal of Machine Learning and Computing
Usually deep learning is crux of attention in computer vision community for various tasks and we believe that a comprehensive review of current state-of-the-art in deep learning for speaker verification ...
DNN are used from extracting features to complete end-to-end system for speaker verification. ...
Fig. 1 . 1 Deep bottleneck features used for GMM-UBM/ i-vector.
Fig. 2 . 2 Deep features systems used for speaker verification. ...
doi:10.18178/ijmlc.2019.9.1.760
fatcat:dskecbzey5eyhak5zv7wzq4eyq
CNN with Phonetic Attention for Text-Independent Speaker Verification
2019
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
With the incorporation of spoken content and attention mechanism, the system can not only distill the speaker-discriminant frames but also actively normalize the phonetic variations. ...
Text-independent speaker verification imposes no constraints on the spoken content and usually needs long observations to make reliable prediction. ...
CONCLUSIONS In this paper, we proposed an attention-based deep convolutional network using phonetic information for textindependent speaker verification. ...
doi:10.1109/asru46091.2019.9003826
dblp:conf/asru/ZhouZLGW19
fatcat:a7wfr4mcgbad7ayils23aajidy
Speaker Diarization Through Speaker Embeddings
2015
Zenodo
In that context, the hidden layers of the Deep Neural Networks (DNN) are learned to extract information relevant for discriminating between speakers. ...
This speaker verification step has been successfully performed with PLDA in previous work [1] . PLDA is a probabilistic version of Linear Discriminant Analysis (LDA). ...
doi:10.5281/zenodo.38841
fatcat:bxtt7v52ujelndvdwisn4pugs4
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
[article]
2018
arXiv
pre-print
In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system. ...
First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result. ...
He gives insightful advice on the implementation of end-to-end discriminative loss. This research was funded in part by the National Natural Science ...
arXiv:1804.05160v1
fatcat:5ar3oyo23zb5hcnrhozpvpx6cq
One-class Learning Towards Synthetic Voice Spoofing Detection
[article]
2021
arXiv
pre-print
Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech ...
The key idea is to compact the bona fide speech representation and inject an angular margin to separate the spoofing attacks in the embedding space. ...
The ASVspoof challenge series [6, 7, 3] has been providing datasets and metrics for anti-spoofing speaker verification research. ...
arXiv:2010.13995v2
fatcat:le4vukrvavfmdmjx2rzbh35ziy
DropClass and DropAdapt: Dropping classes for deep speaker representation learning
[article]
2020
arXiv
pre-print
Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers. ...
Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set. ...
Conclusion In this work we presented the DropClass and DropAdapt methods for training and fine-tuning deep speaker embeddings. ...
arXiv:2002.00453v1
fatcat:dmrcvmp7lbcz5i5otg3jh2xc6i
« Previous
Showing results 1 — 15 out of 2,313 results