A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Exploring the Importance of F0 Trajectories for Speaker Anonymization using X-vectors and Neural Waveform Models
[article]
2021
arXiv
pre-print
Many state-of-the-art approaches are based on the resynthesis of the phoneme posteriorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors. ...
Voice conversion for speaker anonymization is an emerging field in speech processing research. ...
A new identity is formed according to the speaker pool, then new set of features are re-synthesized into a waveform using an acoustic model and a waveform model. ...
arXiv:2110.06887v1
fatcat:cbggzcp7tbcq3dhi7biykvtaeq
Speaker Anonymization Using X-vector and Neural Waveform Models
2019
10th ISCA Speech Synthesis Workshop
unpublished
The idea is to extract linguistic and speaker identity features from an utterance and then to use these with neural acoustic and waveform models to synthesize anonymized speech. ...
These are used to derive anonymized pseudo speaker identities through the combination of multiple, random speaker x-vectors. ...
Acknowledgements This work was partially supported by a JST CREST Grant (JPMJCR18A6, VoicePersonae project), Japan, and by MEXT KAKENHI Grants (16H06302, 17H04687, 18H04120, 18H04112, 18KT0051), Japan. ...
doi:10.21437/ssw.2019-28
fatcat:rxj6sr762nh67fxhslmjnpcp7u
Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models
[article]
2022
arXiv
pre-print
model and speech waveform generation model. ...
Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic ...
Acknowledgment This study is supported by JST CREST Grants (JPMJCR18A6 and JPMJCR20D3), MEXT KAKENHI Grants (21K17775, 21H04906, 21K11951, 18H04112), and the VoicePersonal project (ANR-18-JSTS-0001). ...
arXiv:2202.13097v3
fatcat:73ntrvneqbfhposowp4kpucdpy
Introducing the VoicePrivacy Initiative
[article]
2020
arXiv
pre-print
We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results. ...
In this paper, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation. ...
In Step 3, an SS AM generates Mel-filterbank features given the anonymized x-vector and the F0+BN features, and a neural source-filter (NSF) waveform model [36] outputs a speech signal given the anonymized ...
arXiv:2005.01387v3
fatcat:f4fgcoxqg5ftxcdx4lymkegna4
Introducing the VoicePrivacy Initiative
2020
Interspeech 2020
We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results. ...
In this paper, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation. ...
In Step 3, an SS AM generates Mel-filterbank features given the anonymized x-vector and the F0+BN features, and a neural source-filter (NSF) waveform model [36] outputs a speech signal given the anonymized ...
doi:10.21437/interspeech.2020-1333
dblp:conf/interspeech/TomashenkoS00NY20
fatcat:65nqflofsnchzobt6h72mu7fla
Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release
[article]
2020
arXiv
pre-print
However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type ...
Experiments on public datasets verify the effectiveness and efficiency of the proposed methods. ...
Similarly, speaker anonymization using the x-vector and neural waveform models is discussed by Fang et al. [9] . ...
arXiv:2004.07442v1
fatcat:cwrgu7w4hvgxjj66dac7olweta
Evaluating X-vector-based Speaker Anonymization under White-box Assessment
[article]
2021
arXiv
pre-print
Targeting a unique identity also allows us to investigate whether some target's identities are better than others to anonymize a given speaker. ...
In the scenario of the Voice Privacy challenge, anonymization is achieved by converting all utterances from a source speaker to match the same target identity; this identity being randomly selected. ...
Experiments were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations. ...
arXiv:2109.11946v2
fatcat:2375jv7burgknirwjigq4qusxq
Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020
[article]
2021
arXiv
pre-print
We use population data to learn the properties of the X-vector space, before fitting a generative model which we use to sample fake X-vectors. ...
Our method can be easily integrated with others as the anonymization component of the system and removes the need to distribute a pool of speakers to use during the anonymization. ...
Speech Synthesis acoustic model is used to generate Melfilterbanks, which are fed with the F0 and new X-vector to a Neural source-filter model to generate audio. ...
arXiv:2010.13457v2
fatcat:hbhbmf3mmnaunh2wbgsgz4yrwa
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance
[article]
2021
arXiv
pre-print
In this paper, we investigate the use of quantized vectors to model the latent linguistic embedding and compare it with the continuous counterpart. ...
the representation bit-rate, which is desirable for data transferring, or limiting the information leaking, which is important for speaker anonymization and other tasks of that nature. ...
Fang, X. Wang, J. Yamagishi, I. Echizen, M. Todisco, N. Evans,
and J.-F. Bonastre, “Speaker anonymization using x-vector and
neural waveform models,” in Proc. SSW, 2019, pp. 155–160. ...
arXiv:2106.13479v1
fatcat:3pva7ksvirgdzijtu5x7anizs4
The VoicePrivacy 2020 Challenge: Results and findings
[article]
2021
arXiv
pre-print
In particular, we describe the voice anonymization task and datasets used for system development and evaluation. ...
Also, we present different attack models and the associated objective and subjective evaluation metrics. ...
The authors acknowledge support by ANR, JST (21K17775), and the European Union's Horizon 2020 Research and Innovation Program, and they would like to thank Christine Meunier. ...
arXiv:2109.00648v3
fatcat:oyu4fa32xjfnvhno7h5mlcxr3i
A Tandem Framework Balancing Privacy and Security for Voice User Interfaces
[article]
2021
arXiv
pre-print
., identity and emotion, while preserving linguistic information. Adversaries may use advanced transformation tools to trigger a spoofing attack using fraudulent biometrics for a legitimate speaker. ...
In this paper,(i) we investigate the applicability of the current voice anonymization methods by deploying a tandem framework that jointly combines anti-spoofing and authentication models, and evaluate ...
We use an x-vector [63] embedding extractor network that was a pre-trained recipe of the Kaldi toolkit [54] . ...
arXiv:2107.10045v1
fatcat:vgw7mmseurb3bfr6s2xg4c3enq
Sample Efficient Adaptive Text-to-Speech
[article]
2019
arXiv
pre-print
During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. ...
The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity ...
This model maps a waveform sequence of arbitrary length to a fixed 256-dimensional d-vector with a sliding window, and is trained from approximately 36M utterances from 18K speakers extracted from anonymized ...
arXiv:1809.10460v3
fatcat:zchuw4fbifb37ddmem2yaqvmry
HiFi-VC: High Quality ASR-Based Voice Conversion
[article]
2022
arXiv
pre-print
Our approach uses automated speech recognition (ASR) features, pitch tracking, and a state-of-the-art waveform prediction model. ...
VC is usually used in entertainment and speaking-aid systems, as well as applied for speech data generation and augmentation. ...
The core source of recent advances lies in using deep learning (DL) and large datasets. Several works make use of neural waveform encoders [9, 10, 11] and decoders [12] . ...
arXiv:2203.16937v1
fatcat:oapl7fp4abdohntywkpudnicmm
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy
[article]
2022
arXiv
pre-print
By operating on smaller frames, the waveform neural model is able to perform better at smaller sizes and is better suited for applications where memory is limited. ...
In this paper, we develop a conformer-based waveform-domain neural AEC model inspired by the "TasNet" architecture. ...
Acknowledgements We wish to thank Yuma Koizumi, Tom O'Malley and Joe Caroselli for discussions. ...
arXiv:2205.03481v1
fatcat:4ahmsra2gzfilc3sqwkjzw3nxa
Design Choices for X-Vector Based Speaker Anonymization
2020
Interspeech 2020
To assess the strength of anonymization achieved, we consider attackers using an x-vector based speaker verification system who may use original or anonymized speech for enrollment, depending on their ...
We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender selection. ...
Step 3 (Speech synthesis) synthesizes a speech waveform from the anonymized x-vector and the original BN and F0 features using an acoustic model (AM) and the NSF model. ...
doi:10.21437/interspeech.2020-2692
dblp:conf/interspeech/SrivastavaT00YM20
fatcat:ahbfjuoeefa7xb3mt2kmn7nke4
« Previous
Showing results 1 — 15 out of 432 results