Filters








112 Hits in 4.0 sec

Parallel WaveNet: Fast High-Fidelity Speech Synthesis [article]

Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe (+9 others)
2017 arXiv   pre-print
The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and  ...  The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system  ...  Acknowledgements In this paper, we have described the research advances that made it possible for WaveNet to meet the speed and quality requirements for being used in production at Google.  ... 
arXiv:1711.10433v1 fatcat:ikhszzl4evgohjv7dkoyvcclm4

Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression [article]

Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda
2020 arXiv   pre-print
The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works.  ...  In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique.  ...  for generating high-fidelity speech.  ... 
arXiv:2003.11750v1 fatcat:7t6lhbaqmjhfzpkpxhlwmoroay

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU [article]

Po-chun Hsu, Hung-yi Lee
2020 arXiv   pre-print
In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter.  ...  Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time.  ...  WG-WaveNet has the MOS comparable to that of Parallel WaveGAN, and the inference speed is faster than other methods, which shows the advantage of WG-WaveNet as a vocoder for fast high-quality speech synthesis  ... 
arXiv:2005.07412v3 fatcat:r3rndqwuqfcobncv3jetrlmcnq

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder [article]

Qiao Tian, Xucheng Wan, Shan Liu
2019 arXiv   pre-print
Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems.  ...  Secondly, a parallel WaveNet is trained under a distillation framework, which makes it tedious to adapt a well trained model to a new speaker.  ...  WaveNet Adaptation Speaker adaptation is a commonly adopted method for fast building of acoustic models for speech synthesis and speech recognition, especially for cases where training data are limited  ... 
arXiv:1812.02339v2 fatcat:g72hmes32fabbkbrmh7zya365a

Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda
2020 IEEE Access  
The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works.  ...  In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique.  ...  for generating high-fidelity speech.  ... 
doi:10.1109/access.2020.2984007 fatcat:eg35hus5izc2hiycweguufa7xi

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPU

Po-chun Hsu, Hung-yi Lee
2020 Interspeech 2020  
In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter.  ...  Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time.  ...  WG-WaveNet has the MOS comparable to that of Parallel WaveGAN, and the inference speed is faster than other methods, which shows the advantage of WG-WaveNet as a vocoder for fast high-quality speech synthesis  ... 
doi:10.21437/interspeech.2020-1736 dblp:conf/interspeech/HsuL20 fatcat:fd4f3i5665ewtik4smea7vgika

WaveNet: A Generative Model for Raw Audio [article]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
2016 arXiv   pre-print
A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity.  ...  This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.  ...  Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthe- sizers for mobile devices. In Interspeech, 2016. URL https://arxiv.org/abs/1606. 06061.  ... 
arXiv:1609.03499v2 fatcat:x2j3gbxuczaaldjl2r2p54qx2m

Parallel Synthesis for Autoregressive Speech Generation [article]

Po-chun Hsu, Da-rong Liu, Andy T. Liu, Hung-yi Lee
2022 arXiv   pre-print
Autoregressive models have achieved outstanding performance in neural speech synthesis tasks.  ...  Many works were dedicated to generating the whole speech time sequence in parallel and then proposed GAN-based, flow-based, and score-based models.  ...  While inference using a CPU or GPU, the synthesis speed achieves real-time and is as fast as parallel methods.  ... 
arXiv:2204.11806v1 fatcat:so6p3y2rffbm5gy3k4ravimwym

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech [article]

Wei Ping, Kainan Peng, Jitong Chen
2019 arXiv   pre-print
In addition, we introduce the first text-to-wave neural architecture for speech synthesis, which is fully convolutional and enables fast end-to-end training from scratch.  ...  In this work, we propose a new solution for parallel wave generation by WaveNet.  ...  Acknowledgements We thank Yongguo Kang, Yu Gu and Tao Sun from Baidu Speech Department for very helpful discussions. We also thank anonymous reviewers for their valuable feedback and suggestions.  ... 
arXiv:1807.07281v3 fatcat:ms5aytw5sfdnlczv75xftu6s2q

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram [article]

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
2020 arXiv   pre-print
Furthermore, our model is able to generate high-fidelity speech even with its compact architecture.  ...  Parallel WaveNet system.  ...  Note that most listeners were unsatisfied with the high-frequency noise caused by the autoregressive WaveNet system.  ... 
arXiv:1910.11480v2 fatcat:uh6nagxx7fan5lf4gknmii3qhi

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder [article]

Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
2018 arXiv   pre-print
synthesis that uses the WaveNet vocoder.  ...  Recent neural networks such as WaveNet and sampleRNN that learn directly from speech waveform samples have achieved very high-quality synthetic speech in terms of both naturalness and speaker similarity  ...  It took about a week to train a high-quality multi-speaker WaveNet vocoder and eight minutes to synthesize ten seconds of speech.  ... 
arXiv:1807.11679v1 fatcat:pqsrkbf7cnerndm554fj26ynhy

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics [article]

Vivek Jayaram, John Thickstun
2021 arXiv   pre-print
This approach parallelizes the sampling process and generalizes to conditional sampling.  ...  Parallel wavenet: Fast high-fidelity speech synthesis. In International Confer- ence on Machine Learning, 2018. Veaux, C., Yamagishi, J., MacDonald, K., et al.  ...  In IEEE International diverse high-fidelity images with vq-vae-2.  ... 
arXiv:2105.08164v2 fatcat:sed6pvnne5gmpcev57isqbkpcm

WaveFlow: A Compact Flow-based Model for Raw Audio [article]

Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
2020 arXiv   pre-print
It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands  ...  It can generate 22.05 kHz high-fidelity audio 42.6× faster than real-time (at a rate of 939.3 kHz) on a V100 GPU without engineered inference kernels.  ...  As a result, WaveFlow is a very compelling neural vocoder, which features i) simple likelihood-based training, ii) high-fidelity & ultra-fast synthesis, and iii) small memory footprint. 5 The WaveNet  ... 
arXiv:1912.01219v4 fatcat:irizts3shbbsjhxohcwuti37oy

FloWaveNet : A Generative Flow for Raw Audio [article]

Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon
2019 arXiv   pre-print
Most modern text-to-speech architectures use a WaveNet vocoder for synthesizing high-fidelity waveform audio, but there have been limitations, such as high inference time, in its practical application  ...  The recently suggested Parallel WaveNet and ClariNet have achieved real-time audio synthesis capability by incorporating inverse autoregressive flow for parallel sampling.  ...  Current state-of-the-art text-to-speech architectures commonly use the WaveNet vocoder with a mel-scale spectrogram as an input for high-fidelity audio synthesis (Shen Proceedings Arik et al., 2017b  ... 
arXiv:1811.02155v3 fatcat:yfcwwal4xjck3hilkggz6o67n4

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation [article]

Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda
2020 arXiv   pre-print
While utilizing PWG as a vocoder to generate speech on the basis of acoustic features such as spectral and prosodic features, PWG generates high-fidelity speech.  ...  In this paper, we propose a parallel WaveGAN (PWG)-like neural vocoder with a quasi-periodic (QP) architecture to improve the pitch controllability of PWG.  ...  To achieve high fidelity SS, many neural network (NN)based autoregressive (AR) SS models such as SampleRNN [6] and WaveNet (WN) [7] have been proposed to directly model the probability distributions  ... 
arXiv:2005.08654v2 fatcat:t3uvjh3sfvh2pipg3vkics7zpe
« Previous Showing results 1 — 15 out of 112 results