A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities
2021
IEEE Access
to adapt neural net-
work models
Result may change
with the increasing
number of speakers'
data ...
Here, both the front-end and back-end task is done by a single architecture. Few standard end-to-end speaker recognition architectures are deep speaker, raw-net, AM-MobileNet, Sinc-net etc. ...
doi:10.1109/access.2021.3084299
fatcat:6eavwhxg6jfwngu7bnwzjc4w3q
End-to-End Domain-Adversarial Voice Activity Detection
2020
Interspeech 2020
First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform. ...
To that end, a domain classification branch is added to the network and trained in an adversarial manner. ...
We would like to thank Neville Ryant for providing the speaker diarization output of the winning submission to DIHARD 2019.
References ...
doi:10.21437/interspeech.2020-2285
dblp:conf/interspeech/LavechinGBBG20
fatcat:ox2ibrrxhjgttbqibo2c53bgkm
Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation
[article]
2022
arXiv
pre-print
DPFN is composed of two parts: the speaker module and the separation module. First, the speaker module infers the identities of the speakers. ...
All related approaches can be divided into two categories: time-frequency domain methods and time domain methods. ...
Speaker Module for Unknown Speakers When only the mixed waveform is given without the speaker information, we first separate the mixture through a pre-trained separation model and input the separated waveforms ...
arXiv:2106.07579v2
fatcat:nrdq3wxvtvakzho5aisq7c2nvi
End-to-end Domain-Adversarial Voice Activity Detection
[article]
2020
arXiv
pre-print
First, we design a neural network combining trainable filters and recurrent layers to tackle voice activity detection directly from the waveform. ...
To that end, a domain classification branch is added to the network and trained in an adversarial manner. ...
Reproducible research All the code has been implemented (and integrated into) using pyannote.audio [19] , a python toolkit to build neural networks for the speaker diarization task. ...
arXiv:1910.10655v2
fatcat:utt3bgphhzdfdbtmzred7nvjqy
Privacy-sensitive audio features for conversational speech processing
2012
ACM SIGMultimedia Records
Analysis of conversations can then proceed by modeling the speaker turns and durations produced by speaker diarization. ...
Indeed, the main contributions of this thesis are the achievement of state-of-the-art performances in speech/nonspeech detection and speaker diarization tasks using such features, which we refer to, as ...
But looking back over these four memorable years, pockmarked with unending deadlines, spent working at Idiap and housed in ...
doi:10.1145/2206765.2206771
fatcat:rscelyhx6jer7plmdwfusgoppy
Speaker Recognition from Raw Waveform with SincNet
[article]
2019
arXiv
pre-print
Our experiments, conducted on both speaker identification and speaker verification tasks, show that the proposed architecture converges faster and performs better than a standard CNN on raw waveforms. ...
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker ...
This research was enabled in part by support provided by Calcul Québec and Compute Canada. ...
arXiv:1808.00158v3
fatcat:ox54rihverd2xmeevd6l6pjlou
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder
[article]
2021
arXiv
pre-print
identity and linguistic content to achieve good performance on unseen speaker scenarios. ...
The method has the capability to disentangle speaker identity and linguistic content from utterances, it can convert from many source speakers to many target speakers with a single autoencoder network. ...
Therefore, we believe that disentangled approaches are promising for not only voice conversion but also speaker verification and speaker diarization. ...
arXiv:2107.06642v1
fatcat:wh5iqfduinfovnur5yeq6cewme
2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28
2020
IEEE/ACM Transactions on Audio Speech and Language Processing
., +, TASLP 2020 1233-1247
Hidden Markov models
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice
Priors. ...
Luong, H., +, TASLP 2020
2967-2981
Bayes methods
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice
Priors. ...
T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197 ...
doi:10.1109/taslp.2021.3055391
fatcat:7vmstynfqvaprgz6qy3ekinkt4
A Survey of Sound Source Localization with Deep Learning Methods
[article]
2021
arXiv
pre-print
We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the ...
output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. ...
The specific case where we have several speakers taking speech turns with or without overlap is strongly connected to the speaker diarization problem ("who speaks when?") [53] , [54] , [55] . ...
arXiv:2109.03465v2
fatcat:4zsolfkfsfgavnykr72svnjkjq
Identifying Speakers Using Deep Learning: A review
2021
Zenodo
Deep Neural Networks (DNNs) and also Recurrent Neural Networks (RNNs) are two main types of Deep Learning that are being used in the implementation of such applications. ...
Speaker Identification is being utilized more and more on daily basis and is being focused on by the research community as a result of this demand. ...
An et al. (2019) the paper discussed two CNN based methods for SID, which are: Visual Geometry Group (VGG) nets and Residual Neural Networks (ResNet). ...
doi:10.5281/zenodo.4481596
fatcat:4cpsf3b7ijc6palkqry6zb6yya
Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition
2020
Interspeech 2020
We evaluate three such approaches on our novel experimental protocol developed on two popular spoken intent classification datasets: Google Commands and the Fluent Speech Commands dataset. ...
For a 5-shot (1-shot) classification of novel classes, the proposed framework provides an average classification accuracy of 88.6% (76.3%) on the Google Commands dataset, and 78.5% (64.2%) on the Fluent ...
Further, the models are trained in an end-to-end fashion thereby passing gradients computed from classification error through the linear classifier and the representational neural network simultaneously ...
doi:10.21437/interspeech.2020-3208
dblp:conf/interspeech/MittalBKCSK20
fatcat:2zerbq2eh5a6rbco7a2jjqrlqu
Table of Contents
2021
2021 29th Conference of Open Innovations Association (FRUCT)
unpublished
Speaker Diarization through
. . . ...
Waveform and Neural Net. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
On Applying Convolutional Neural Network to Bearing Fault Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
doi:10.23919/fruct52173.2021.9435600
fatcat:maognf2vbrcu3o63lgnj56ynpq
Biometrics Recognition Using Deep Learning: A Survey
[article]
2021
arXiv
pre-print
deploy deep learning models, and show their strengths and potentials in different applications. ...
For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. ...
Rama Chellappa, and Dr. Nalini Ratha for reviewing this work, and providing very helpful comments and suggestions. ...
arXiv:1912.00271v3
fatcat:nobon7vrrrdnxe4pr3q2anl63y
Unsupervised Learning for Expressive Speech Synthesis
2018
IberSPEECH 2018
The main difficulty consists in the highly speaker-and situation-dependent nature of expressiveness, causing many and acoustically substantial variations. ...
sets in the multi-speaker domain. ...
Acknowledgements First of all I would like to thank Antonio Bonafonte for his help, lead and patience, and for the opportunity to work and to develop this work in his group. ...
doi:10.21437/iberspeech.2018-38
dblp:conf/iberspeech/Jauk18
fatcat:6zogjdy3gjgslfbbgrqirjzsx4
Acoustic censusing using automatic vocalization classification and identity recognition
2010
Journal of the Acoustical Society of America
Individually distinct acoustic features have been observed in a wide range of animal species, and this combined with the widespread success of speaker identification and verification methods for human ...
The underlying algorithm is based on clustering using hidden Markov models ͑HMMs͒ and Gaussian mixture models ͑GMMs͒ similar to methods used in the speech recognition community for tasks such as speaker ...
Figures 5 and 6 illustrate the language model and waveform-to-HMM matching process. ...
doi:10.1121/1.3273887
pmid:20136210
fatcat:3tkipeumbvapnp3uexa6ennppy
« Previous
Showing results 1 — 15 out of 29 results