Filters








250 Hits in 2.5 sec

Dynamic Margin Softmax Loss for Speaker Verification

Dao Zhou, Longbiao Wang, Kong Aik Lee, Yibo Wu, Meng Liu, Jianwu Dang, Jianguo Wei
2020 Interspeech 2020  
We propose a dynamic-margin softmax loss for the training of deep speaker embedding neural network. Our proposal is inspired by the additive-margin softmax (AM-Softmax) loss reported earlier.  ...  Thus, it is more reasonable to set a dynamic margin for each training sample.  ...  Thus, it is more reasonable to set a dynamic margin for each training sample. In this paper, we propose a dynamic cosine margin softmax.  ... 
doi:10.21437/interspeech.2020-1106 dblp:conf/interspeech/ZhouWLWLDW20 fatcat:iaqjbpklbjgnln7scxamd3ju7m

Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021 [article]

Li Zhang, Huan Zhao, Qinling Meng, Yanli Chen, Min Liu, Lei Xie
2021 arXiv   pre-print
We participated in the fully supervised speaker verification track 1 and track 2.  ...  With our submission, we came to the second place in the challenge for both tracks.  ...  As compared with the softmax loss, the additive angular margin loss (AAM-Softmax) [29] is more popular in speaker verification as increasing intra-speaker distances and ensuring inter-speaker compactness  ... 
arXiv:2109.03568v2 fatcat:46prulxrerdohcikt7dfmaq2u4

Adaptive Margin Circle Loss for Speaker Verification [article]

Runqiu Xiao
2021 arXiv   pre-print
In this paper, we pro-pose a novel angular loss function called adaptive margin cir-cle loss for speaker verification.  ...  Deep-Neural-Network (DNN) based speaker verification sys-tems use the angular softmax loss with margin penalties toenhance the intra-class compactness of speaker embeddings,which achieved remarkable performance  ...  In speaker verification, the cross-entropy loss function with softmax is most widely used for training the speaker embedding model.  ... 
arXiv:2106.08004v1 fatcat:onecfq6rmjeh7aubjzkhf27svu

Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification

Junyi Peng, Rongzhi Gu, Yuexian Zou
2020 Interspeech 2020  
Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets.  ...  Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL).  ...  Using this pipeline, more well-designed multi-class classification loss functions such as angular softmax loss (ASoftmax), additive margin softmax loss (AMSoftmax), additive angular margin softmax (ArcSoftmax  ... 
doi:10.21437/interspeech.2020-2470 dblp:conf/interspeech/PengGZ20 fatcat:s6sq6ix3zjbe7hhr2xsfcjt5fy

Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function

Jianfeng Zhou, Tao Jiang, Zheng Li, Lin Li, Qingyang Hong
2019 Interspeech 2019  
In speaker verification, the convolutional neural networks (C-NN) have been successfully leveraged to achieve a great performance.  ...  Additionally, we propose a new loss function, namely additive supervision softmax (AS-Softmax), to make full use of the prior knowledge of the mis-classified samples at training stage by imposing more  ...  suitable for the speaker verification task.  ... 
doi:10.21437/interspeech.2019-1704 dblp:conf/interspeech/ZhouJLLH19 fatcat:6va5knr4cnf4lhh2mjlpecybua

ARET: Aggregated Residual Extended Time-Delay Neural Networks for Speaker Verification

Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Longbiao Wang, Meng Liu, Lin Zhang, Jiayu Jin, Junhai Xu
2020 Interspeech 2020  
The time-delay neural network (TDNN) is widely used in speaker verification to extract long-term temporal features of speakers.  ...  Although common TDNN approaches well capture time-sequential information, they lack the delicate transformations needed for deep representation.  ...  End-to-end speaker verification models are augmented by large margin softmax losses [11, 12, 13, 14, 15, 4] .  ... 
doi:10.21437/interspeech.2020-1626 dblp:conf/interspeech/ZhangWLWLZJX20 fatcat:by5pzsk46rhbpnuzifvb5sunku

VoxCeleb: Large-scale Speaker Verification in the Wild

Arsha Nagrani, Joon Son Chung, Weidi Xie, Andrew Zisserman
2019 Computer Speech and Language  
Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the identity of the speaker  ...  Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and usually require manual annotations, hence are limited in size.  ...  Acknowledgements Funding for this research is provided by the EPSRC Programme Grant Seebibyte EP/M013774/1. AN is supported by a Google Ph.D.  ... 
doi:10.1016/j.csl.2019.101027 fatcat:ih2gshb7pfhgdlnsx7hj3s7oka

Densely Connected Time Delay Neural Network for Speaker Verification

Ya-Qi Yu, Wu-Jun Li
2020 Interspeech 2020  
Time delay neural network (TDNN) has been widely used in speaker verification tasks.  ...  Cosine-based Softmax Loss Softmax-based cross-entropy loss, also called softmax loss, is usually adopted for training multi-class classifiers.  ...  Angular additive margin softmax (AAM-Softmax) loss [18] adds a margin m on the angular values and ψ(·) is defined as: ψ(cos θ) = cos(θ + m).  ... 
doi:10.21437/interspeech.2020-1275 dblp:conf/interspeech/YuL20 fatcat:twsxomgknndnhpvsmajldgzwjq

Dynamic Multi-scale Convolution for Dialect Identification [article]

Tianlong Kong, Shouyi Yin, Dawei Zhang, Wang Geng, Xin Wang, Dandan Song, Jinwen Huang, Huiyu Shi, Xiaorui Wang
2021 arXiv   pre-print
Local multi-scale learning, which represents multi-scale features at a granular level, is able to increase the range of receptive fields for convolution operation.  ...  To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling.  ...  For each language, a segment with 200 to 400 frames is sliced from the utterances. Additive Angular Margin Softmax (AAM-Softmax) [29] loss is used to train the baseline system.  ... 
arXiv:2108.07787v1 fatcat:cfykz4cfnrbbzn4l66cv2cwzx4

Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning [article]

Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim
2020 arXiv   pre-print
Also, to preserve speaker discriminability, a contrastive similarity loss function is used together.  ...  In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors  ...  In this experiments, we set the cosine margin m=0.2 and scale factor s=30, • ArcFace [29] : Additive angular margin softmax loss also called AAM-softmax loss.  ... 
arXiv:2010.11433v1 fatcat:nkhnvzz5vfcnhkvgh6vyrkarpe

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding [article]

Ge Zhu, Fei Jiang, Zhiyao Duan
2021 arXiv   pre-print
We show that the proposed embeddings outperform existing raw-waveform-based speaker embeddings on speaker verification by a large margin.  ...  State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features.  ...  As for the AM-Softmax loss function, the scale factor and margin are set to 30 and 0.35 respectively.  ... 
arXiv:2010.12951v3 fatcat:poppjxmnvbgwzmyrijzekgskza

Learning Speaker Embedding with Momentum Contrast [article]

Ke Ding and Xuanji He and Guanglu Wan
2020 arXiv   pre-print
Speaker verification can be formulated as a representation learning task, where speaker-discriminative embeddings are extracted from utterances of variable lengths.  ...  In this work, we apply MoCo to learn speaker embedding from speech segments. We explore MoCo for both unsupervised learning and pretraining settings.  ...  X-vector with AAM loss Additive Angular Margin (AAM) loss is reported more effective than the conventional cross entropy loss for both face [10] and speaker verification tasks [12] .  ... 
arXiv:2001.01986v2 fatcat:plp5y2rvg5cidgmd3tlbb47nre

Additive Phoneme-aware Margin Softmax Loss for Language Recognition [article]

Zheng Li, Yan Liu, Lin Li, Qingyang Hong
2021 arXiv   pre-print
This paper proposes an additive phoneme-aware margin softmax (APM-Softmax) loss to train the multi-task learning network with phonetic information for language recognition.  ...  In additive margin softmax (AM-Softmax) loss, the margin is set as a constant during the entire training for all training samples, and that is a suboptimal method since the recognition difficulty varies  ...  Thus, it is more reasonable to use a dynamic margin or a dynamic angular margin based on phonetic information to improve AM-Softmax loss or AAM-Softmax loss.  ... 
arXiv:2106.12851v1 fatcat:27l34bmtvnarpkkddykp7mj5km

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification [article]

Achintya kr. Sarkar, Zheng-Hua Tan
2022 arXiv   pre-print
Deep representation learning has gained significant momentum in advancing text-dependent speaker verification (TD-SV) systems.  ...  Among the various loss functions, cross entropy, joint-softmax and focal loss functions outperform the others. Finally, score-level fusion of different systems is also able to reduce the error rates.  ...  Therefore, softmax function with angular margin is introduced in [13] for face recognition and the learned feature with this loss function will be angularly distributed.  ... 
arXiv:2201.06426v1 fatcat:kt7wb4tfs5eqfc5t5vnqncyj6i

Within-sample variability-invariant loss for robust speaker recognition under noisy environments [article]

Danwei Cai, Weicheng Cai, Ming Li
2020 arXiv   pre-print
Specifically, the network is trained with the original speaker identification loss with an auxiliary within-sample variability-invariant loss.  ...  Experiments on VoxCeleb1 indicate that the proposed training framework improves the performance of the speaker verification system in both clean and noisy conditions.  ...  For the speaker identification loss, a standard softmax-based cross-entropy loss or angular softmax loss (A-softmax) [27] is used.  ... 
arXiv:2002.00924v2 fatcat:5b3honknwjcfdbhx3tjtnuhyzm
« Previous Showing results 1 — 15 out of 250 results