Filters








62 Hits in 2.9 sec

High-quality nonparallel voice conversion based on cycle-consistent adversarial network [article]

Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba
2018 arXiv   pre-print
In this paper, we propose using a cycle-consistent adversarial network (CycleGAN) for nonparallel data-based VC training.  ...  A subjective evaluation of inter-gender conversion demonstrated that the proposed method significantly outperformed a method based on the Merlin open source neural network speech synthesis system (a parallel  ...  A potential way to improve the performance of nonparallel VC systems is to use a cycle-consistent adversarial network (Cycle-GAN) [11] .  ... 
arXiv:1804.00425v1 fatcat:kgw4ux5345hgtdzoq6axqaka5i

High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network

Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba
2018 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we propose using a cycle-consistent adversarial network (CycleGAN) for nonparallel data-based VC training.  ...  A subjective evaluation of inter-gender conversion demonstrated that the proposed method significantly outperformed a method based on the Merlin open source neural network speech synthesis system (a parallel  ...  A potential way to improve the performance of nonparallel VC systems is to use a cycle-consistent adversarial network (Cycle-GAN) [11] .  ... 
doi:10.1109/icassp.2018.8462342 dblp:conf/icassp/FangYEL18 fatcat:mk447nstpfg4vnycrlc7pbkku4

Many-to-Many Voice Conversion using Cycle-Consistent Variational Autoencoder with Multiple Decoders [article]

Keonnyeong Lee, In-Chul Yoo, Dongsuk Yook
2020 arXiv   pre-print
In this paper, we propose a cycle consistency loss for VAE to explicitly learn the conversion path.  ...  Though it can handle many-to-many voice conversion without the parallel training, the VAE based voice conversion methods suffer from low sound qualities of the converted speech.  ...  Though the VAEWGAN produces somewhat higher sound quality than the VAE, it can handle only one-to-one voice conversion.  ... 
arXiv:1909.06805v4 fatcat:v4rpa7cpkbh3lhkxwfqmaxx5iy

CycleGAN with Dual Adversarial Loss for Bone-Conducted Speech Enhancement [article]

Qing Pan, Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan
2021 arXiv   pre-print
The proposed method uses an adversarial loss and a cycle-consistent loss simultaneously to learn forward and cyclic mapping, in which the adversarial loss is replaced with the classification adversarial  ...  Keywords: Bone-conducted speech enhancement, dual adversarial loss, Parallel CycleGAN, high frequency speech reconstruction  ...  The CycleGAN is a cycle-consistent adversarial networks with gated convolution and identity mapping loss, originally used for unpaired image style conversion [22] .  ... 
arXiv:2111.01430v1 fatcat:aw7nudc3gfaj3gphdtoh4bqake

CycleGAN Voice Conversion of Spectral Envelopes using Adversarial Weights [article]

Rafael Ferro, Nicolas Obin, Axel Roebel
2020 arXiv   pre-print
A subjective experiment conducted on a voice conversion task on the voice conversion challenge 2018 dataset shows first that despite a significantly reduced network complexity, the proposed method achieves  ...  Applying an energy constraint to the cycleGAN paradigm considerably improved conversion quality.  ...  In particular, parallel VC based on sequence-to-sequence models has recent reached a very good conversion quality [3, 4] .  ... 
arXiv:1910.12614v2 fatcat:rfwer3saffbqpbxwgmx3ln3uiq

Time Domain Adversarial Voice Conversion for ADD 2022 [article]

Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li
2022 arXiv   pre-print
Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the target speaker%u2019s fake speech.  ...  The experimental results show that our system has adversarial ability against anti-spoofing detectors with a little compromise in audio quality and speaker similarity.  ...  Such as Cycle-VAE [13] , Disentangled-VAE [14] , fang's CycleGAN-based nonparallel VC [15] , STARGAN-VC [16] and StarGANv2-VC [17] .  ... 
arXiv:2204.08692v2 fatcat:pabn2ol3rbhflbhlr24alky2ea

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN [article]

Zongyang Du, Kun Zhou, Berrak Sisman, Haizhou Li
2020 arXiv   pre-print
Previous studies on cross-lingual voice conversion mainly focus on spectral conversion with a linear transformation for F0 transfer.  ...  It relies on non-parallel training data from two different languages, hence, is more challenging than mono-lingual voice conversion.  ...  ] have achieved high-quality mono-lingual voice conversion with non-parallel data.  ... 
arXiv:2008.04562v3 fatcat:ha34ouykybe2zgreakjcqsfusy

Many-to-Many Unsupervised Speech Conversion From Nonparallel Corpora

Yun Kyung Lee, Hyun Woo Kim, Jeon Gue Park
2021 IEEE Access  
The proposed method comprises a variational autoencoder (VAE)-based many-to-many speech conversion network with a Wasserstein generative adversarial network (WGAN) and a skip-connected autoencoder-based  ...  We also train models in a stable manner and improve the quality of generated outputs by sharing the discriminator of the VAE-based speech conversion network and that of the self-supervised learning network  ...  The challenge in developing the nonparallel conversion model with as high a sound quality as parallel methods, has attracted many works in recent years.  ... 
doi:10.1109/access.2021.3058382 fatcat:mzmt5b6gs5a3vgvfhkqomjm7ja

Nonparallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks [article]

Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
2020 arXiv   pre-print
We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN.  ...  In this paper, we describe three formulations of StarGAN, including a newly introduced novel StarGAN variant called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a nonparallel VC task  ...  , cycle consistency, and identity mapping losses.  ... 
arXiv:2008.12604v7 fatcat:f2ps44jexzahbktnzcb7dpqabm

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Shreyas Seshadri, Lauri Juvela, Junichi Yamagishi, Okko Rasanen, Paavo Alku
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case  ...  Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another.  ...  As a recent alternative to INCA, Cycle-consistent adversarial networks (CycleGANs, [20] ) have shown promise in the domain of voice conversion.  ... 
doi:10.1109/icassp.2019.8682648 dblp:conf/icassp/SeshadriJYRA19 fatcat:b276yynejngubmix5leelzhp2y

CycleGAN Voice Conversion of Spectral Envelopes using Adversarial Weights

Rafael Ferro
2020 Zenodo  
A subjective experiment conducted on a voice conversion task on the voice conversion challenge 2018 dataset shows first that despite a significantly reduced network complexity, the proposed method achieves  ...  Applying an energy constraint to the cycleGAN paradigm considerably improved conversion quality.  ...  Related works Voice identity conversion (VC) consists in modifying the voice of a source speaker so as to be perceived as the one of a target speaker.  ... 
doi:10.5281/zenodo.3956350 fatcat:6elbzecturfsxgomez33rmf2fm

Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks

Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma
2019 Interspeech 2019  
They also help generate high-quality fake samples at the very beginning of the adversarial training.  ...  This paper proposes a fast learning framework for non-parallel many-to-many voice conversion with residual Star Generative Adversarial Networks (StarGAN).  ...  generate high-quality fake samples at the very beginning of the adversarial training.  ... 
doi:10.21437/interspeech.2019-2067 dblp:conf/interspeech/ZhaoNWM19 fatcat:wcptecjupveg3jxhsdfyac5gia

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion [article]

Yinghao Aaron Li, Ali Zare, Nima Mesgarani
2021 arXiv   pre-print
We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.  ...  Subjective and objective evaluation experiments on a non-parallel many-to-many voice conversion task revealed that our model produces natural sounding voices, close to the sound quality of state-of-the-art  ...  Acknowledgements We would like to acknowledge Ryo Kato for proposing Star-GAN v2 for voice conversion and funding is from the National Institute of Health, NIDCD.  ... 
arXiv:2107.10394v2 fatcat:57uzvbnqgjab3e6y2ion3g2jqq

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data [article]

Junchen Lu, Kun Zhou, Berrak Sisman, Haizhou Li
2020 arXiv   pre-print
In this paper, we propose a singing voice conversion framework that is based on VAW-GAN. We train an encoder to disentangle singer identity and singing prosody (F0 contour) from phonetic content.  ...  Singing voice conversion aims to convert singer's voice from source to target without changing singing content.  ...  easier than other nonparallel generative models, such as cycle-consistent generative adversarial network (CycleGAN) [6] , [30] , [33] , [34] .  ... 
arXiv:2008.03992v3 fatcat:h2qsjue56nez7fxaiiyx7tmwka

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition [article]

Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg
2022 arXiv   pre-print
In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech  ...  using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task.  ...  Department of Head and Neck Oncology and surgery of the Netherlands Cancer Institute receives a research grant from Atos Medical (Hörby, Sweden), which contributes to the existing infrastructure for quality  ... 
arXiv:2201.04908v1 fatcat:6irdfblu3bf5pfbjno6au6m5a4
« Previous Showing results 1 — 15 out of 62 results