Filters








2,944 Hits in 8.8 sec

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition [article]

Shuai Wang, Zili Huang, Yanmin Qian, Kai Yu
2018 arXiv   pre-print
Linear Discriminant Analysis (LDA) has been used as a standard post-processing procedure in many state-of-the-art speaker recognition tasks.  ...  In this paper, we propose a neural network based compensation scheme(termed as deep discriminant analysis, DDA) for i-vector based speaker recognition, which shares the spirit with LDA.  ...  Probabilistic Linear Discriminant Analysis i-vectors with Probabilistic Linear Discriminant Analysis (PLDA) back-end obtains the state-of-the-art performance in speaker verification.  ... 
arXiv:1805.01344v1 fatcat:asptlxvmlvetjbi7v7aac4a4ry

Robust Speaker Verification with Principal Pitch Components

Robert M. Nickel, Sachin P. Oswal, Ananth N. Iyer
2005 International Journal of Speech Technology  
limit of the cepstral analysis.  ...  We are presenting a new method that improves the accuracy of text dependent speaker identification systems.  ...  For text dependent speaker verification, cepstral features exhibit a discriminative power that is, as of now, unsurpassed by any other feature representation for speech [1] .  ... 
doi:10.1007/s10772-006-9048-4 fatcat:xhkcyqlk7bdlfg5cep4ac6ybnu

Robust speech analysis by lag-weighted linear prediction

Jouni Pohjalainen, Paavo Alku
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
This study introduces an approach for linear predictive spectrum analysis based on emphasizing selected time-domain properties in the analyzed signal in combination with a stabilization operation.  ...  A stable weighted linear predictive method based on a novel autocorrelation-based weighting scheme is described and its spectral properties are demonstrated.  ...  Recently, temporally weighted linear prediction [5] with its many variants has been applied (by the present authors) to text independent speaker verification [4] [6] and large vocabulary continuous  ... 
doi:10.1109/icassp.2012.6288908 dblp:conf/icassp/PohjalainenA12 fatcat:h7ysjoesgnehtgzv4welrqqiw4

Brief Review of Short Utterance Speaker Verification Systems

Asmita Nirmal
2020 Bioscience Biotechnology Research Communications  
Due to technological improvements many methods have been proposed for speaker verification.  ...  In this paper we primarily emphasis on the survey of different feature extraction methods for textindependent speaker verification. We first briefly review conventional systems to show its progress.  ...  of deep features in a tandem method for speaker verification is studied by fU et al.,2014. phone discriminant and speaker discriminant Dnn are combined with conventional acoustic features and applied  ... 
doi:10.21786/bbrc/13.14/95 fatcat:acnfuhrdgzdinfs37z4rs4rtf4

Boosted binary features for noise-robust speaker verification

Anindya Roy, Mathew Magimai.-Doss, Sebastien Marcel
2010 2010 IEEE International Conference on Acoustics, Speech and Signal Processing  
The standard approach to speaker verification is to extract cepstral features from the speech spectrum and model them by generative or discriminative techniques.  ...  The final classifier is a simple linear combination of these selected features.  ...  Nelson Morgan and Dr.Francesco Orabona for their comments and advice.  ... 
doi:10.1109/icassp.2010.5495622 dblp:conf/icassp/RoyMM10 fatcat:duvv34xg2benvnj6jd2l6bu26q

End-to-End attention based text-dependent speaker verification

Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong
2016 2016 IEEE Spoken Language Technology Workshop (SLT)  
Previously, using the phonetic/speaker discriminative DNNs as feature extractors for speaker verification has shown promising results.  ...  In this work we use speaker discriminative CNNs to extract the noise-robust frame-level features.  ...  In this paper we use speaker discriminative CNNs to extract noise robust frame-level features.  ... 
doi:10.1109/slt.2016.7846261 dblp:conf/slt/ZhangCZLG16 fatcat:ar7kuwixsbalbigtu5aghppll4

Recent advances in biometric person authentication

Dugelay, Junqua, Kotropoulos, Kuhn, Perronnin, Pitas
2002 IIEEE International Conference on Acoustics Speech and Signal Processing  
While enabling technologies (e.g. audio, video) for biometrics have mostly used separately, ultimately, biometric technologies could find their strongest role as interwined and complementary pieces of  ...  ) that are provided by Linear Discriminant Analysis (LDA) or Fisher Linear Discriminant (FLD) [21] .  ...  discriminants [29] , optimized robust correlation [30] , EGM that employs either multiscale dilation-erosion and combines linear projections at the graph nodes [31] [32] , or morphological signal  ... 
doi:10.1109/icassp.2002.1004810 fatcat:ychkz3csa5bzjpmurfdjotn7sy

Recent advances in biometric person authentication

J.-L. Dugelay, J.-C. Junqua, C. Kotropoulos, R. Kuhn, F. Perronnin, I. Pitas
2002 IEEE International Conference on Acoustics Speech and Signal Processing  
While enabling technologies (e.g. audio, video) for biometrics have mostly used separately, ultimately, biometric technologies could find their strongest role as interwined and complementary pieces of  ...  ) that are provided by Linear Discriminant Analysis (LDA) or Fisher Linear Discriminant (FLD) [21] .  ...  discriminants [29] , optimized robust correlation [30] , EGM that employs either multiscale dilation-erosion and combines linear projections at the graph nodes [31] [32] , or morphological signal  ... 
doi:10.1109/icassp.2002.5745549 dblp:conf/icassp/DugelayJKKPP02 fatcat:c2vfd4lecbg5hi6xdxtnzay4bi

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification [article]

Lanhua You, Wu Guo, Lirong Dai, Jun Du
2019 arXiv   pre-print
The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification.  ...  The proposed training strategy aggregates both the supervised and unsupervised learning into one framework to make the speaker embeddings more discriminative and robust.  ...  Combined with the probabilistic linear discriminant analysis (PLDA) [2] backend, the i-vector/PLDA framework has become the dominant approach for the last decade.  ... 
arXiv:1903.12058v2 fatcat:dlreitygybhtrhnjtpyotdjp2m

Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification

Junyi Peng, Rongzhi Gu, Yuexian Zou
2020 Interspeech 2020  
Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative.  ...  Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets.  ...  The combination of i-vector and Probabilistic Linear Discriminant Analysis (PLDA) has dominated for over 10 years [2] .  ... 
doi:10.21437/interspeech.2020-2470 dblp:conf/interspeech/PengGZ20 fatcat:s6sq6ix3zjbe7hhr2xsfcjt5fy

Local spectral variability features for speaker verification

Md Sahidullah, Tomi Kinnunen
2016 Digital signal processing (Print)  
To sum up, combining local covariance information with the traditional cepstral features holds promise as an additional speaker cue in both text-independent and textdependent recognition.  ...  Article info:eu-repo/semantics/acceptedVersion © Elsevier B.V CC BY-NC-ND https://creativecommons.org/licenses/by-nc-nd/4.0/ http://dx.Abstract Speaker verification techniques neglect the short-time variation  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their valuable comments and suggestions which have greatly helped in improving the content of this paper.  ... 
doi:10.1016/j.dsp.2015.10.011 fatcat:zrqxp7mdnbccxl5tbrhqzfc5hi

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey [article]

Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch
2021 arXiv   pre-print
For many years, acoustic information alone has been a great success in automatic speaker verification applications.  ...  The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research.  ...  LBPs features are used for face recognition using a semi-supervised discriminant analysis as an extension to linear discriminant analysis (LDA) [145] .  ... 
arXiv:2101.09725v1 fatcat:huejyfaeojhzddlckqt5nfivlq

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra, S. R. Mahadeva Prasanna, Christoph Busch
2021 IEEE Access  
For many years, acoustic information alone has been a great success in automatic speaker verification applications.  ...  The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research.  ...  LBPs features are used for face recognition using a semi-supervised discriminant analysis as an extension to linear discriminant analysis (LDA) [145] .  ... 
doi:10.1109/access.2021.3063031 fatcat:q6emam55frhlzp53t7lxb4qz3e

Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations [article]

Wei Xia, John H.L. Hansen
2020 arXiv   pre-print
The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset, which is a large scale speaker verification corpus collected in the wild.  ...  In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations  ...  The paradigm has shifted from GMM-UBM and factor analysis based methods like i-vector [7, 8] with a probabilistic linear discriminant (PLDA) back-end [9, 10] to deep neural network based models.  ... 
arXiv:2009.00768v2 fatcat:qvk2urqeoverllrwxd64o5y6je

Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations

Wei Xia, John H.L. Hansen
2020 Interspeech 2020  
The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset, which is a large scale speaker verification corpus collected in the wild.  ...  In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations  ...  The paradigm has shifted from GMM-UBM and factor analysis based methods like i-vector [6, 7] with a probabilistic linear discriminant (PLDA) back-end [8, 9] to deep neural network based models.  ... 
doi:10.21437/interspeech.2020-1845 dblp:conf/interspeech/XiaH20 fatcat:24xmwci7tvawrni7ey3ohmtndm
« Previous Showing results 1 — 15 out of 2,944 results