Filters








432 Hits in 2.7 sec

Exploring the Importance of F0 Trajectories for Speaker Anonymization using X-vectors and Neural Waveform Models [article]

Ünal Ege Gaznepoglu, Nils Peters
2021 arXiv   pre-print
Many state-of-the-art approaches are based on the resynthesis of the phoneme posteriorgrams (PPG), the fundamental frequency (F0) of the input signal together with modified X-vectors.  ...  Voice conversion for speaker anonymization is an emerging field in speech processing research.  ...  A new identity is formed according to the speaker pool, then new set of features are re-synthesized into a waveform using an acoustic model and a waveform model.  ... 
arXiv:2110.06887v1 fatcat:cbggzcp7tbcq3dhi7biykvtaeq

Speaker Anonymization Using X-vector and Neural Waveform Models

Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre
2019 10th ISCA Speech Synthesis Workshop   unpublished
The idea is to extract linguistic and speaker identity features from an utterance and then to use these with neural acoustic and waveform models to synthesize anonymized speech.  ...  These are used to derive anonymized pseudo speaker identities through the combination of multiple, random speaker x-vectors.  ...  Acknowledgements This work was partially supported by a JST CREST Grant (JPMJCR18A6, VoicePersonae project), Japan, and by MEXT KAKENHI Grants (16H06302, 17H04687, 18H04120, 18H04112, 18KT0051), Japan.  ... 
doi:10.21437/ssw.2019-28 fatcat:rxj6sr762nh67fxhslmjnpcp7u

Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models [article]

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko
2022 arXiv   pre-print
model and speech waveform generation model.  ...  Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic  ...  Acknowledgment This study is supported by JST CREST Grants (JPMJCR18A6 and JPMJCR20D3), MEXT KAKENHI Grants (21K17775, 21H04906, 21K11951, 18H04112), and the VoicePersonal project (ANR-18-JSTS-0001).  ... 
arXiv:2202.13097v3 fatcat:73ntrvneqbfhposowp4kpucdpy

Introducing the VoicePrivacy Initiative [article]

Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
2020 arXiv   pre-print
We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results.  ...  In this paper, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation.  ...  In Step 3, an SS AM generates Mel-filterbank features given the anonymized x-vector and the F0+BN features, and a neural source-filter (NSF) waveform model [36] outputs a speech signal given the anonymized  ... 
arXiv:2005.01387v3 fatcat:f4fgcoxqg5ftxcdx4lymkegna4

Introducing the VoicePrivacy Initiative

N. Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
2020 Interspeech 2020  
We also present the attack models and the associated objective and subjective evaluation metrics. We introduce two anonymization baselines and report objective evaluation results.  ...  In this paper, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for system development and evaluation.  ...  In Step 3, an SS AM generates Mel-filterbank features given the anonymized x-vector and the F0+BN features, and a neural source-filter (NSF) waveform model [36] outputs a speech signal given the anonymized  ... 
doi:10.21437/interspeech.2020-1333 dblp:conf/interspeech/TomashenkoS00NY20 fatcat:65nqflofsnchzobt6h72mu7fla

Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release [article]

Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa
2020 arXiv   pre-print
However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type  ...  Experiments on public datasets verify the effectiveness and efficiency of the proposed methods.  ...  Similarly, speaker anonymization using the x-vector and neural waveform models is discussed by Fang et al. [9] .  ... 
arXiv:2004.07442v1 fatcat:cwrgu7w4hvgxjj66dac7olweta

Evaluating X-vector-based Speaker Anonymization under White-box Assessment [article]

Pierre Champion
2021 arXiv   pre-print
Targeting a unique identity also allows us to investigate whether some target's identities are better than others to anonymize a given speaker.  ...  In the scenario of the Voice Privacy challenge, anonymization is achieved by converting all utterances from a source speaker to match the same target identity; this identity being randomly selected.  ...  Experiments were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations.  ... 
arXiv:2109.11946v2 fatcat:2375jv7burgknirwjigq4qusxq

Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020 [article]

Henry Turner, Giulio Lovisotto, Ivan Martinovic
2021 arXiv   pre-print
We use population data to learn the properties of the X-vector space, before fitting a generative model which we use to sample fake X-vectors.  ...  Our method can be easily integrated with others as the anonymization component of the system and removes the need to distribute a pool of speakers to use during the anonymization.  ...  Speech Synthesis acoustic model is used to generate Melfilterbanks, which are fed with the F0 and new X-vector to a Neural source-filter model to generate audio.  ... 
arXiv:2010.13457v2 fatcat:hbhbmf3mmnaunh2wbgsgz4yrwa

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance [article]

Hieu-Thi Luong, Junichi Yamagishi
2021 arXiv   pre-print
In this paper, we investigate the use of quantized vectors to model the latent linguistic embedding and compare it with the continuous counterpart.  ...  the representation bit-rate, which is desirable for data transferring, or limiting the information leaking, which is important for speaker anonymization and other tasks of that nature.  ...  Fang, X. Wang, J. Yamagishi, I. Echizen, M. Todisco, N. Evans, and J.-F. Bonastre, “Speaker anonymization using x-vector and neural waveform models,” in Proc. SSW, 2019, pp. 155–160.  ... 
arXiv:2106.13479v1 fatcat:3pva7ksvirgdzijtu5x7anizs4

The VoicePrivacy 2020 Challenge: Results and findings [article]

Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien, Anaïs Chanclu, Jean-François Bonastre (+2 others)
2021 arXiv   pre-print
In particular, we describe the voice anonymization task and datasets used for system development and evaluation.  ...  Also, we present different attack models and the associated objective and subjective evaluation metrics.  ...  The authors acknowledge support by ANR, JST (21K17775), and the European Union's Horizon 2020 Research and Innovation Program, and they would like to thank Christine Meunier.  ... 
arXiv:2109.00648v3 fatcat:oyu4fa32xjfnvhno7h5mlcxr3i

A Tandem Framework Balancing Privacy and Security for Voice User Interfaces [article]

Ranya Aloufi, Hamed Haddadi, David Boyle
2021 arXiv   pre-print
., identity and emotion, while preserving linguistic information. Adversaries may use advanced transformation tools to trigger a spoofing attack using fraudulent biometrics for a legitimate speaker.  ...  In this paper,(i) we investigate the applicability of the current voice anonymization methods by deploying a tandem framework that jointly combines anti-spoofing and authentication models, and evaluate  ...  We use an x-vector [63] embedding extractor network that was a pre-trained recipe of the Kaldi toolkit [54] .  ... 
arXiv:2107.10045v1 fatcat:vgw7mmseurb3bfr6s2xg4c3enq

Sample Efficient Adaptive Text-to-Speech [article]

Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord (+2 others)
2019 arXiv   pre-print
During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker.  ...  The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity  ...  This model maps a waveform sequence of arbitrary length to a fixed 256-dimensional d-vector with a sliding window, and is trained from approximately 36M utterances from 18K speakers extracted from anonymized  ... 
arXiv:1809.10460v3 fatcat:zchuw4fbifb37ddmem2yaqvmry

HiFi-VC: High Quality ASR-Based Voice Conversion [article]

A. Kashkin, I. Karpukhin, S. Shishkin
2022 arXiv   pre-print
Our approach uses automated speech recognition (ASR) features, pitch tracking, and a state-of-the-art waveform prediction model.  ...  VC is usually used in entertainment and speaking-aid systems, as well as applied for speech data generation and augmentation.  ...  The core source of recent advances lies in using deep learning (DL) and large datasets. Several works make use of neural waveform encoders [9, 10, 11] and decoders [12] .  ... 
arXiv:2203.16937v1 fatcat:oapl7fp4abdohntywkpudnicmm

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy [article]

Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein
2022 arXiv   pre-print
By operating on smaller frames, the waveform neural model is able to perform better at smaller sizes and is better suited for applications where memory is limited.  ...  In this paper, we develop a conformer-based waveform-domain neural AEC model inspired by the "TasNet" architecture.  ...  Acknowledgements We wish to thank Yuma Koizumi, Tom O'Malley and Joe Caroselli for discussions.  ... 
arXiv:2205.03481v1 fatcat:4ahmsra2gzfilc3sqwkjzw3nxa

Design Choices for X-Vector Based Speaker Anonymization

Brij Mohan Lal Srivastava, N. Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi
2020 Interspeech 2020  
To assess the strength of anonymization achieved, we consider attackers using an x-vector based speaker verification system who may use original or anonymized speech for enrollment, depending on their  ...  We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender selection.  ...  Step 3 (Speech synthesis) synthesizes a speech waveform from the anonymized x-vector and the original BN and F0 features using an acoustic model (AM) and the NSF model.  ... 
doi:10.21437/interspeech.2020-2692 dblp:conf/interspeech/SrivastavaT00YM20 fatcat:ahbfjuoeefa7xb3mt2kmn7nke4
« Previous Showing results 1 — 15 out of 432 results