Filters








2,313 Hits in 5.5 sec

Decision Making Based on Cohort Scores for Speaker Verification [article]

Lantian Li, Renyu Wang, Gang Wang, Caixia Wang, Thomas Fang Zheng
2016 arXiv   pre-print
Decision making is an important component in a speaker verification system.  ...  For the conventional GMM-UBM architecture, the decision is usually conducted based on the log likelihood ratio of the test utterance against the GMM of the claimed speaker and the UBM.  ...  Recently, deep learning has been applied to speaker verification and gained much interest [7] , [8] . Within a speaker verification system, decision making is an important component [9] .  ... 
arXiv:1609.08419v1 fatcat:negwpzrryvbjjmf4wff5tkg6q4

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition [article]

Shuai Wang, Zili Huang, Yanmin Qian, Kai Yu
2018 arXiv   pre-print
In this paper, we propose a neural network based compensation scheme(termed as deep discriminant analysis, DDA) for i-vector based speaker recognition, which shares the spirit with LDA.  ...  Optimized against softmax loss and center loss at the same time, the proposed method learns a more compact and discriminative embedding space.  ...  We term this NN-based compensation method as Deep Discriminant Analysis (DDA), for comparison with LDA or NDA.  ... 
arXiv:1805.01344v1 fatcat:asptlxvmlvetjbi7v7aac4a4ry

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification [article]

Yanfeng Wu, Chenkai Guo, Junan Zhao, Xiao Jin, Jing Xu
2021 arXiv   pre-print
the depthwise separable convolutions with low-rank factorization of weight matrices.  ...  The convolutional neural network (CNN) based approaches have shown great success for speaker verification (SV) tasks, where modeling long temporal context and reducing information loss of speaker characteristics  ...  From the description, the essential part of a DNN-based SV system is to build an effective deep embedding architecture for extracting discriminative features between different speakers.  ... 
arXiv:2108.13249v1 fatcat:snrpr5lvlfeldnz525ug4rqfkm

Analysis of Length Normalization in End-to-End Speaker Verification System [article]

Weicheng Cai, Jinkun Chen, Ming Li
2018 arXiv   pre-print
The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterance-level representations in automatic speaker verification systems.  ...  In this paper, we explore how the neural network learns length-normalized deep speaker embeddings in an end-to-end manner.  ...  is the output categories, y i is the deep normalized embedding, ci is the corresponding ground truth label, and W and b are the weights and bias for the last layer of the network which acts as a back-end  ... 
arXiv:1806.03209v2 fatcat:kta6anuhsjcyjjgsjswow2oaim

On Residual CNN in text-dependent speaker verification task [article]

Egor Malykh, Sergey Novoselov, Oleg Kudashev
2017 arXiv   pre-print
Deep learning approaches are still not very common in the speaker verification field.  ...  We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task.  ...  A speaker discriminative approach is the most natural way for speaker verification. [12] describes a DNN for extracting a small speaker footprint which can be used to discriminate between speakers.  ... 
arXiv:1705.10134v2 fatcat:xql2kkkjtrfrzfutwi6owtqbdy

Analysis of Length Normalization in End-to-End Speaker Verification System

Weicheng Cai, Jinkun Chen, Ming Li
2018 Interspeech 2018  
The classical i-vectors and the latest end-to-end deep speaker embeddings are the two representative categories of utterancelevel representations in automatic speaker verification systems.  ...  In this paper, we explore how the neural network learns length-normalized deep speaker embeddings in an end-to-end manner.  ...  is the output categories, y i is the deep normalized embedding, ci is the corresponding ground truth label, and W and b are the weights and bias for the last layer of the network which acts as a back-end  ... 
doi:10.21437/interspeech.2018-92 dblp:conf/interspeech/CaiCL18 fatcat:fn754g4sa5gwzej5z662j3s7yq

Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding? [article]

Qiongqiong Wang and Koji Okabe and Kong Aik Lee and Hitoshi Yamamoto and Takafumi Koshinaka
2018 arXiv   pre-print
This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition.  ...  In this framework, an attention model works as a frame selector that computes an attention weight for each frame-level feature vector, in accord with which an utterancelevel representation is produced  ...  deep speaker embedding network, and (3) Applying attention weights to statistics for i-vector extraction.  ... 
arXiv:1809.09311v1 fatcat:dmoazbooffgmpdde4vunw5i63e

Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification [article]

Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen
2019 arXiv   pre-print
Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios.  ...  Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants.  ...  CONCLUSIONS This paper presented a method to train deep embedding based text-independent speaker verification with a new verification loss function-pAUC.  ... 
arXiv:1911.08077v1 fatcat:3niehrfdgjavfnuemrdrrcraay

Speaker diarization through speaker embeddings

Mickael Rouvier, Pierre-Michel Bousquet, Benoit Favre
2015 2015 23rd European Signal Processing Conference (EUSIPCO)  
This paper proposes to learn a set of high-level feature representations through deep learning, referred to as Speaker Embeddings, for speaker diarization.  ...  Although learned through identification, speaker embeddings are shown to be effective for speaker verification in particular to recognize speakers unseen in the training set.  ...  In that context, the hidden layers of the Deep Neural Networks (DNN) are learned to extract information relevant for discriminating between speakers.  ... 
doi:10.1109/eusipco.2015.7362751 dblp:conf/eusipco/RouvierBF15 fatcat:fni33mx5dvg6rl3e36bootwo5y

Speaker Verification Using Deep Neural Networks: A Review

Amna Irum, School of Electrical Engineering and Computer Sciences (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan, Ahmad Salman
2019 International Journal of Machine Learning and Computing  
Usually deep learning is crux of attention in computer vision community for various tasks and we believe that a comprehensive review of current state-of-the-art in deep learning for speaker verification  ...  DNN are used from extracting features to complete end-to-end system for speaker verification.  ...  Fig. 1 . 1 Deep bottleneck features used for GMM-UBM/ i-vector. Fig. 2 . 2 Deep features systems used for speaker verification.  ... 
doi:10.18178/ijmlc.2019.9.1.760 fatcat:dskecbzey5eyhak5zv7wzq4eyq

CNN with Phonetic Attention for Text-Independent Speaker Verification

Tianyan Zhou, Yong Zhao, Jinyu Li, Yifan Gong, Jian Wu
2019 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)  
With the incorporation of spoken content and attention mechanism, the system can not only distill the speaker-discriminant frames but also actively normalize the phonetic variations.  ...  Text-independent speaker verification imposes no constraints on the spoken content and usually needs long observations to make reliable prediction.  ...  CONCLUSIONS In this paper, we proposed an attention-based deep convolutional network using phonetic information for textindependent speaker verification.  ... 
doi:10.1109/asru46091.2019.9003826 dblp:conf/asru/ZhouZLGW19 fatcat:a7wfr4mcgbad7ayils23aajidy

Speaker Diarization Through Speaker Embeddings

Pierre-Michel Bousquet, Benoit Favre, Mickael Rouvier
2015 Zenodo  
In that context, the hidden layers of the Deep Neural Networks (DNN) are learned to extract information relevant for discriminating between speakers.  ...  This speaker verification step has been successfully performed with PLDA in previous work [1] . PLDA is a probabilistic version of Linear Discriminant Analysis (LDA).  ... 
doi:10.5281/zenodo.38841 fatcat:bxtt7v52ujelndvdwisn4pugs4

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System [article]

Weicheng Cai, Jinkun Chen, Ming Li
2018 arXiv   pre-print
In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system.  ...  First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result.  ...  He gives insightful advice on the implementation of end-to-end discriminative loss. This research was funded in part by the National Natural Science  ... 
arXiv:1804.05160v1 fatcat:5ar3oyo23zb5hcnrhozpvpx6cq

One-class Learning Towards Synthetic Voice Spoofing Detection [article]

You Zhang, Fei Jiang, Zhiyao Duan
2021 arXiv   pre-print
Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech  ...  The key idea is to compact the bona fide speech representation and inject an angular margin to separate the spoofing attacks in the embedding space.  ...  The ASVspoof challenge series [6, 7, 3] has been providing datasets and metrics for anti-spoofing speaker verification research.  ... 
arXiv:2010.13995v2 fatcat:le4vukrvavfmdmjx2rzbh35ziy

DropClass and DropAdapt: Dropping classes for deep speaker representation learning [article]

Chau Luu, Peter Bell, Steve Renals
2020 arXiv   pre-print
Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers.  ...  Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set.  ...  Conclusion In this work we presented the DropClass and DropAdapt methods for training and fine-tuning deep speaker embeddings.  ... 
arXiv:2002.00453v1 fatcat:dmrcvmp7lbcz5i5otg3jh2xc6i
« Previous Showing results 1 — 15 out of 2,313 results