95 Hits in 4.7 sec

Kernel eigenvoice speaker adaptation

B. Mak, J.T. Kwok, S. Ho
2005 IEEE Transactions on Speech and Audio Processing  
adapted models found by EV, MAP, or MLLR adaptation using 2.1 and 4.1 s of speech.  ...  However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the high-dimensional kernel-induced feature space, which, in general, cannot  ...  The simple algorithm was later extended to work for large-vocabulary continuous speech recognition [11] , [12] , eigenspace-based MLLR [13] , [14] , and to approximate the model prior in MAP adaptation  ... 
doi:10.1109/tsa.2005.851971 fatcat:gzcuvhbkcjdntoqnizzhtfii2a

On the Use of Different Feature Extraction Methods for Linear and Non Linear kernels [article]

Imen Trabelsi, Dorra Ben Ayed
2014 arXiv   pre-print
The speech feature extraction has been a key focus in robust speech recognition research; it significantly affects the recognition performance.  ...  Based on this, a comparative evaluation of these features is performed on the task of text independent speaker identification using a combination between gaussian mixture models (GMM) and linear and non-linear  ...  RASTA-filtering Rasta-filtering was proposed for robust speech recognition by Hermansky and Morgan [HER 94 ].  ... 
arXiv:1406.7314v1 fatcat:syiy6p3klfepvakmd2hp6zgd6q

Improved speaker recognition when using i-vectors from multiple speech sources

Mitchell McLaren, David van Leeuwen
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
This paper presents a comparative study of a number of subspace training techniques and a novel source-normalisedand-weighted LDA algorithm for the purpose of improving i-vectorbased speaker recognition  ...  Results from the NIST 2010 speaker recognition evaluation (SRE) suggest that accounting for source conditions in the LDA matrix as opposed to the total variability subspace training regime provides improved  ...  INTRODUCTION The introduction of i-vectors as features for speaker recognition has recently amounted to a new standard in state-of-the-art technology [1, 2] .  ... 
doi:10.1109/icassp.2011.5947594 dblp:conf/icassp/McLarenL11a fatcat:l7bkfpdn45gcpcbyqd52hsf7au

Features and Model Adaptation Techniques for Robust Speech Recognition: A Review

Kapang Legoh, Utpal Bhattacharjee, T. Tuithung
2015 Communications on Applied Electronics  
This paper may be useful as a tutorial and review on state-of-the-art techniques for feature selection, feature normalization and model adaptation techniques for development of robust speech recognition  ...  In this paper, major speech features used in state-of-the-art technology in speech recognition research are reviewed.  ...  For example, an ideal feature for speaker identification would have large speaker variability between speakers but for speech recognition speaker variability must be small or minimum.  ... 
doi:10.5120/cae-1507 fatcat:cbvzysewanet7jmfqmzluhwvpy

On automatic voice casting for expressive speech: Speaker recognition vs. speech classification

Nicolas Obin, Axel Roebel, Gregoire Bachman
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In a subjective experiment conducted in the real-context of voice casting for video games, the multi-label system clearly outperforms standard speaker recognition systems.  ...  This indicates evidence that speech classes successfully capture the principal directions that are used in the perception of voice similarity.  ...  The speaker recognition performance is proved to be robust to expressive variability of the speaker, and to variability in duration of the speech recordings. Table 2 .  ... 
doi:10.1109/icassp.2014.6853737 dblp:conf/icassp/ObinRB14 fatcat:wz3ydfi2dvhpfb4rkobxf33bi4

Diagnosis of depression by behavioural signals

Nicholas Cummins, Jyoti Joshi, Abhinav Dhall, Vidhyasaharan Sethu, Roland Goecke, Julien Epps
2013 Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge - AVEC '13  
Points in a Bag-of-Words approach for the vision subsystem.  ...  Key results include the strong performance of acoustic audio features and the bag-of-words visual features in predicting an individual's level of depression using regression.  ...  Jia Min Karen Kua for her speech processing advice and code and Sharifa Alghowinem for her insights and advice.  ... 
doi:10.1145/2512530.2512535 dblp:conf/mm/CumminsJDSGE13 fatcat:6faosmojevdbfcjsyb3ay6liyq

An Overview of Speaker Identification: Accuracy and Robustness Issues

Roberto Togneri, Daniel Pullella
2011 IEEE Circuits and Systems Magazine  
The feature compensation approach for robust speaker recognition.  ...  MFCC Features For speaker recognition it is important to extract features from each frame which can capture the speakerspecific characteristics.  ...  Noisy Testing Speech Framed Time-Domian Signal  ... 
doi:10.1109/mcas.2011.941079 fatcat:jnp75b7tjvaq5f3jfyoroolmuy

Frame Selection for Text-independent Speaker Recognition

Abedenebi Rouigueb, Malek Nadil, Abderrahmane Tikourt
2017 Proceedings of the 14th International Joint Conference on e-Business and Telecommunications  
In this paper, we propose a set of criteria for the selection of the most relevant frames in order to improve text-independent speaker automatic recognition (TISAR) task.  ...  Experiments are conducted on the MOBIO database and show that the selection allows an improvement in complexity (time and space) and in speaker identification rate, which is appropriate for real-time TISAR  ...  For each individual, 12 sessions were captured where 192 utterances are recorded by mobile phone (NOKIA N93i).  ... 
doi:10.5220/0006392100510057 dblp:conf/sigmap/RouiguebNT17 fatcat:paisdgooynde3pzxolvrowavf4

Source-normalised-and-weighted LDA for robust speaker recognition using i-vectors

Mitchell McLaren, David van Leeuwen
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
., telephone, microphone and interview speech sources) for each speaker.  ...  Proposed is a novel source-normalised-and-weighted LDA algorithm developed to improve the robustness of i-vector-based speaker recognition under both mis-matched evaluation conditions and conditions for  ...  Consequently, robust speaker recognition is challenging when nontelephone speech is encountered during system evaluation, particularly in the case of mis-matched trials (ie., train on microphone speech  ... 
doi:10.1109/icassp.2011.5947593 dblp:conf/icassp/McLarenL11 fatcat:rlnahom2tjchtlj24lk2wjigjy

Rapid speaker adaptation in eigenvoice space

R. Kuhn, J.-C. Junqua, P. Nguyen, N. Niedzielski
2000 IEEE Transactions on Speech and Audio Processing  
Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data.  ...  His current research interests are rapid speaker adaptation, acoustic modeling, and speaker identification/verification.  ...  The obvious parallel in speech research to face recognition is not speech recognition but speaker recognition (usually called "speaker identification and verification").  ... 
doi:10.1109/89.876308 fatcat:srqy3fx6gfeaxnjwdazofr577u

Comparison of Generative and Discriminative Approaches for Speaker Recognition with Limited Data

J. Silovsky, P. Cerva, J. Zdansky
2009 Radioengineering  
Performed experiments are specific particularly for the very limited amount of data used for both speaker enrollment (typically ranging from 30 to 60 seconds) and recognition (typically ranging from 5  ...  This paper presents a comparison of three different speaker recognition methods deployed in a broadcast news processing system.  ...  Acknowledgements The research was supported by Czech Science Foundation (GACR) grants no. 102/07/P430 and 102/08/0707.  ... 
doaj:5b0a7c3b1af34a41a647658c867cebad fatcat:xulxjn4dcncnnbzzbpspdmx4la

Investigation of Deep Neural Network for Speaker Recognition

A. V. N. S. Bhavana
2020 International Journal for Research in Applied Science and Engineering Technology  
In this paper, deep neural networks are investigated for Speaker recognition. Deep neural networks (DNN) are recently proposed for this task.  ...  Evaluations of models were performed on 10,100,300 speakers of testing data with 2.5 hours for every speaker utterance.  ...  This work will be aided by the increasing use of reliable speech recognition systems for speaker recognition R&D.  ... 
doi:10.22214/ijraset.2020.6140 fatcat:yhhslaqbarfw5mv4rh5y42onti

The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition

Iain McCowan, David Dean, Mitchell McLaren, Robert Vogt, Sridha Sridharan
2011 IEEE Transactions on Audio, Speech, and Language Processing  
Sridha (2011) The delta-phase spectrum with application to voice activity detection and speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing.  ...  The research was supported in part by the Australian Research Council (ARC) Discovery Grant DP0877835.  ...  ACKNOWLEDGEMENTS The authors wish to thank the reviewers for their valuable comments which has enabled us to improve the quality of the manuscript, as well as Dan Ellis of Columbia University for his help  ... 
doi:10.1109/tasl.2011.2109379 fatcat:dqsvdzwrqzht3iqrqz6bp4ckaa

Diffusion maps for PLDA-based speaker verification

Oren Barkan, Hagai Aronowitz
2013 2013 IEEE International Conference on Acoustics, Speech and Signal Processing  
During the last few years, i-vectors have become an important component in most state-of-the-art speaker recognition systems.  ...  Ivector extraction is based on an assumption that GMM supervectors reside on a low dimensional space, which is modeled using Factor Analysis.  ...  robust and accurate framework for speaker verification.  ... 
doi:10.1109/icassp.2013.6639149 dblp:conf/icassp/BarkanA13 fatcat:47oupwjpdbh3pht4xkseyu4le4

An improved uncertainty propagation method for robust i-vector based speaker recognition [article]

Dayana Ribas, Emmanuel Vincent
2019 arXiv   pre-print
The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation.  ...  We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement.  ...  Also we plan to extend the experimental setup for including female speakers. ACKNOWLEDGMENTS Authors would like to thank Dr.  ... 
arXiv:1902.05761v2 fatcat:euyldqwa5rf3hkhrfej5pq6qom
« Previous Showing results 1 — 15 out of 95 results