29 Hits in 2.8 sec

A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities

Muhammad Mohsin Kabir, M. F. Mridha, Jungpil Shin, Israt Jahan, Abu Quwsar Ohi
2021 IEEE Access  
to adapt neural net- work models Result may change with the increasing number of speakers' data  ...  Here, both the front-end and back-end task is done by a single architecture. Few standard end-to-end speaker recognition architectures are deep speaker, raw-net, AM-MobileNet, Sinc-net etc.  ... 
doi:10.1109/access.2021.3084299 fatcat:6eavwhxg6jfwngu7bnwzjc4w3q

End-to-End Domain-Adversarial Voice Activity Detection

Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola Garcia-Perera
2020 Interspeech 2020  
First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform.  ...  To that end, a domain classification branch is added to the network and trained in an adversarial manner.  ...  We would like to thank Neville Ryant for providing the speaker diarization output of the winning submission to DIHARD 2019. References  ... 
doi:10.21437/interspeech.2020-2285 dblp:conf/interspeech/LavechinGBBG20 fatcat:ox2ibrrxhjgttbqibo2c53bgkm

Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation [article]

Fan-Lin Wang, Yu-Huai Peng, Hung-Shin Lee, Hsin-Min Wang
2022 arXiv   pre-print
DPFN is composed of two parts: the speaker module and the separation module. First, the speaker module infers the identities of the speakers.  ...  All related approaches can be divided into two categories: time-frequency domain methods and time domain methods.  ...  Speaker Module for Unknown Speakers When only the mixed waveform is given without the speaker information, we first separate the mixture through a pre-trained separation model and input the separated waveforms  ... 
arXiv:2106.07579v2 fatcat:nrdq3wxvtvakzho5aisq7c2nvi

End-to-end Domain-Adversarial Voice Activity Detection [article]

Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola Garcia-Perera
2020 arXiv   pre-print
First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform.  ...  To that end, a domain classification branch is added to the network and trained in an adversarial manner.  ...  Reproducible research All the code has been implemented (and integrated into) using [19] , a python toolkit to build neural networks for the speaker diarization task.  ... 
arXiv:1910.10655v2 fatcat:utt3bgphhzdfdbtmzred7nvjqy

Privacy-sensitive audio features for conversational speech processing

Sree Hari Krishnan Parthasarathi
2012 ACM SIGMultimedia Records  
Analysis of conversations can then proceed by modeling the speaker turns and durations produced by speaker diarization.  ...  Indeed, the main contributions of this thesis are the achievement of state-of-the-art performances in speech/nonspeech detection and speaker diarization tasks using such features, which we refer to, as  ...  But looking back over these four memorable years, pockmarked with unending deadlines, spent working at Idiap and housed in  ... 
doi:10.1145/2206765.2206771 fatcat:rscelyhx6jer7plmdwfusgoppy

Speaker Recognition from Raw Waveform with SincNet [article]

Mirco Ravanelli, Yoshua Bengio
2019 arXiv   pre-print
Our experiments, conducted on both speaker identification and speaker verification tasks, show that the proposed architecture converges faster and performs better than a standard CNN on raw waveforms.  ...  Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker  ...  This research was enabled in part by support provided by Calcul Québec and Compute Canada.  ... 
arXiv:1808.00158v3 fatcat:ox54rihverd2xmeevd6l6pjlou

Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder [article]

Manh Luong, Viet Anh Tran
2021 arXiv   pre-print
identity and linguistic content to achieve good performance on unseen speaker scenarios.  ...  The method has the capability to disentangle speaker identity and linguistic content from utterances, it can convert from many source speakers to many target speakers with a single autoencoder network.  ...  Therefore, we believe that disentangled approaches are promising for not only voice conversion but also speaker verification and speaker diarization.  ... 
arXiv:2107.06642v1 fatcat:wh5iqfduinfovnur5yeq6cewme

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP 2020 1233-1247 Hidden Markov models Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors.  ...  Luong, H., +, TASLP 2020 2967-2981 Bayes methods Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors.  ...  T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

A Survey of Sound Source Localization with Deep Learning Methods [article]

Pierre-Amaury Grumiaux, Srđan Kitić, Laurent Girin, Alexandre Guérin
2021 arXiv   pre-print
We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the  ...  output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy.  ...  The specific case where we have several speakers taking speech turns with or without overlap is strongly connected to the speaker diarization problem ("who speaks when?") [53] , [54] , [55] .  ... 
arXiv:2109.03465v2 fatcat:4zsolfkfsfgavnykr72svnjkjq

Identifying Speakers Using Deep Learning: A review

Lawchak Fadhil Khalid, Lawchak Fadhil Abdulazeez
2021 Zenodo  
Deep Neural Networks (DNNs) and also Recurrent Neural Networks (RNNs) are two main types of Deep Learning that are being used in the implementation of such applications.  ...  Speaker Identification is being utilized more and more on daily basis and is being focused on by the research community as a result of this demand.  ...  An et al. (2019) the paper discussed two CNN based methods for SID, which are: Visual Geometry Group (VGG) nets and Residual Neural Networks (ResNet).  ... 
doi:10.5281/zenodo.4481596 fatcat:4cpsf3b7ijc6palkqry6zb6yya

Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition

Ashish Mittal, Samarth Bharadwaj, Shreya Khare, Saneem Chemmengath, Karthik Sankaranarayanan, Brian Kingsbury
2020 Interspeech 2020  
We evaluate three such approaches on our novel experimental protocol developed on two popular spoken intent classification datasets: Google Commands and the Fluent Speech Commands dataset.  ...  For a 5-shot (1-shot) classification of novel classes, the proposed framework provides an average classification accuracy of 88.6% (76.3%) on the Google Commands dataset, and 78.5% (64.2%) on the Fluent  ...  Further, the models are trained in an end-to-end fashion thereby passing gradients computed from classification error through the linear classifier and the representational neural network simultaneously  ... 
doi:10.21437/interspeech.2020-3208 dblp:conf/interspeech/MittalBKCSK20 fatcat:2zerbq2eh5a6rbco7a2jjqrlqu

Table of Contents

2021 2021 29th Conference of Open Innovations Association (FRUCT)   unpublished
Speaker Diarization through . . .  ...  Waveform and Neural Net. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  On Applying Convolutional Neural Network to Bearing Fault Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ... 
doi:10.23919/fruct52173.2021.9435600 fatcat:maognf2vbrcu3o63lgnj56ynpq

Biometrics Recognition Using Deep Learning: A Survey [article]

Shervin Minaee, Amirali Abdolrashidi, Hang Su, Mohammed Bennamoun, David Zhang
2021 arXiv   pre-print
deploy deep learning models, and show their strengths and potentials in different applications.  ...  For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics.  ...  Rama Chellappa, and Dr. Nalini Ratha for reviewing this work, and providing very helpful comments and suggestions.  ... 
arXiv:1912.00271v3 fatcat:nobon7vrrrdnxe4pr3q2anl63y

Unsupervised Learning for Expressive Speech Synthesis

Igor Jauk
2018 IberSPEECH 2018  
The main difficulty consists in the highly speaker-and situation-dependent nature of expressiveness, causing many and acoustically substantial variations.  ...  sets in the multi-speaker domain.  ...  Acknowledgements First of all I would like to thank Antonio Bonafonte for his help, lead and patience, and for the opportunity to work and to develop this work in his group.  ... 
doi:10.21437/iberspeech.2018-38 dblp:conf/iberspeech/Jauk18 fatcat:6zogjdy3gjgslfbbgrqirjzsx4

Acoustic censusing using automatic vocalization classification and identity recognition

Kuntoro Adi, Michael T. Johnson, Tomasz S. Osiejuk
2010 Journal of the Acoustical Society of America  
Individually distinct acoustic features have been observed in a wide range of animal species, and this combined with the widespread success of speaker identification and verification methods for human  ...  The underlying algorithm is based on clustering using hidden Markov models ͑HMMs͒ and Gaussian mixture models ͑GMMs͒ similar to methods used in the speech recognition community for tasks such as speaker  ...  Figures 5 and 6 illustrate the language model and waveform-to-HMM matching process.  ... 
doi:10.1121/1.3273887 pmid:20136210 fatcat:3tkipeumbvapnp3uexa6ennppy
« Previous Showing results 1 — 15 out of 29 results