192 Hits in 4.8 sec

Analysis by Adversarial Synthesis — A Novel Approach for Speech Vocoding

Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas Maier
2019 Interspeech 2019  
In this work, we introduce a new methodology for neural speech vocoding based on generative adversarial networks (GANs).  ...  The reconstructed speech waveforms based on this approach show a higher perceptual quality than the classical vocoder counterparts according to subjective and objective evaluation scores for a dataset  ...  The main idea of such approaches for GAN-based vocoding is to generate the glottal excitation signal adversarially and then apply synthesis filtering to obtain the speech waveform.  ... 
doi:10.21437/interspeech.2019-1195 dblp:conf/interspeech/MustafaBBSM19 fatcat:4yeskn5mwbeijkjnzmkd34m7ji

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi
2018 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation.  ...  overcome by using advanced machine learning approaches.  ...  Among other waveform generation methods, the PML vocoder, while being similar to WORLD for analysis-by-synthesis, it lagged behind when using the generated acoustic features.  ... 
doi:10.1109/icassp.2018.8461452 dblp:conf/icassp/WangLTJY18 fatcat:v7utkglqnzehzmrlnngqwsgnoa

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation [article]

Rongjie Huang, Chenye Cui, Feiyang Chen, Yi Ren, Jinglin Liu, Zhou Zhao, Baoxing Huai, Zhefeng Wang
2022 arXiv   pre-print
In this work, we propose SingGAN, a generative adversarial network designed for high-fidelity singing voice synthesis.  ...  Existing neural vocoders designed for text-to-speech cannot directly be applied to singing voice synthesis because they result in glitches and poor high-frequency reconstruction.  ...  Model CONCLUSION We proposed SingGAN, a novel generative adversarial network designed for high-fidelity singing voice vocoding.  ... 
arXiv:2110.07468v4 fatcat:si6q4q2od5dfzdk3kdkwsn5ctu

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis [article]

Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi
2018 arXiv   pre-print
In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation.  ...  overcome by using advanced machine learning approaches.  ...  Among other waveform generation methods, the PML vocoder, while being similar to WORLD for analysis-by-synthesis, it lagged behind when using the generated acoustic features.  ... 
arXiv:1804.02549v1 fatcat:ck5l7y4srfdc7jtekiqhbw7kci

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis [article]

Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, Linghui Chen, Lei Xie, Shan Liu
2020 arXiv   pre-print
To address this issue, we propose a novel vocoder model: TFGAN, which is adversarially learned both in time and frequency domain.  ...  TFGAN has nearly same synthesis speed as MelGAN, but the fidelity is significantly improved by our novel learning method.  ...  Also, LPCNet [5] greatly increased the synthesis speed on the premise of guaranteeing voice quality by combining linear prediction coding with neural vocoder.  ... 
arXiv:2011.12206v1 fatcat:pndaol5crvevhdndkguiuevcim

Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System

Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
2019 Interspeech 2019  
approaches for speech analysis and synthesis.  ...  A vocoder is a tool that is employed in the analysis and synthesis of speech signals.  ... 
doi:10.21437/interspeech.2019-1333 dblp:conf/interspeech/BollepalliJA19 fatcat:5uz43svog5erzev5nzakdnc4qe

Creating Song from Lip and Tongue Videos with a Convolutional Vocoder

Jianyu Zhang, Pierre Roussel, Bruce Denby
2021 IEEE Access  
A novel convolutional vocoder to transform the learned parameters into an audio signal is also presented.  ...  Comparison of the convolutional vocoder to standard vocoders is made. Results can be of interest in the study of singing articulation as well as for silent speech interface research.  ...  ACKNOWLEDGMENT The authors would like to thank Aurore Jaumard-Hakoun for help in reconstructing the analysis procedure used in [11] .  ... 
doi:10.1109/access.2021.3050843 fatcat:i4xx6m5d2nhk3pwzeoi6p5omcq

High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram [article]

Leyuan Sheng, Dong-Yan Huang, Evgeniy N. Pavlovskiy
2019 arXiv   pre-print
In speech synthesis and speech enhancement systems, melspectrograms need to be precise in acoustic representations.  ...  Inspired by image-to-image translation, we address this problem by using a learning-based post filter combining Pix2PixHD and ResUnet to reconstruct the mel-spectrograms together with super-resolution.  ...  In this work, we proposed a novel model to improve melspectrograms prediction for high-quality speech synthesis by combining the advantages of Pix2PixHD [14] and deep residual U-Net (ResUnet) [16] .  ... 
arXiv:1912.01167v1 fatcat:bjcl5zcuofapxkt5f25h6gsr3q

AI based Presentation Creator With Customized Audio Content Delivery [article]

Muvazima Mansoor, Srikanth Chandar, Ramamoorthy Srinath
2021 arXiv   pre-print
In this paper, we propose an architecture to solve a novel problem statement that has stemmed more so in recent times with an increase in demand for virtual content delivery due to the COVID-19 pandemic  ...  Tacotron inspired architecture with Encoder, Synthesizer, and a Generative Adversarial Network (GAN) based vocoder, is used to convey the contents of the slides in the author's voice (or any customized  ...  We extend our thanks to PES University, who provided us a platform that helped us to team up and pursue this project. We also thank Dr. Anuradha M, for her support.  ... 
arXiv:2106.14213v1 fatcat:3pki2dg6qrhevbwhk5hniixgey

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus [article]

Rongjie Huang, Feiyang Chen, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao
2021 arXiv   pre-print
The joint training approach effectively works in GANs for multi-singer voices modeling.  ...  To tackle the difficulty in unseen singer modeling, we propose Multi-Singer, a fast multi-singer vocoder with generative adversarial networks.  ...  AISHELL-3 [36] contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese speakers, which could be applied for multi-speaker speech synthesis.  ... 
arXiv:2112.10358v1 fatcat:nmnbeshurbb7xlhlvki4ogguye

Avocodo: Generative Adversarial Network for Artifact-free Vocoder [article]

Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
2022 arXiv   pre-print
Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based neural vocoders and propose a GAN-based neural vocoder, called Avocodo, that allows the synthesis of high-fidelity  ...  The experimental results show that Avocodo outperforms conventional GAN-based neural vocoders in both speech and singing voice synthesis tasks and can synthesize artifact-free speech.  ...  The CoMBD performs multi-scale analysis in a balanced manner and suppress artifacts with a novel structure.  ... 
arXiv:2206.13404v2 fatcat:wqpkmzyjnzcqbnidq4bfm3pjay

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition [article]

Zhengxi Liu, Yanmin Qian
2021 arXiv   pre-print
Recent studies have shown that neural vocoders based on generative adversarial network (GAN) can generate audios with high quality.  ...  To reduce the computation of upsampling layers, we propose a new GAN based neural vocoder called Basis-MelGAN where the raw audio samples are decomposed with a learned basis and their associated weights  ...  We also examine the proposed models' effectiveness when applied to an end-to-end speech synthesis pipeline, which is a acoustic model for text to mel spectrogram and a neural vocoder for mel spectrogram  ... 
arXiv:2106.13419v1 fatcat:f2rhfgz4trffpkr7p2l6e3ao5e

A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data

Xiaohai Tian, Eng Siong Chng, Haizhou Li
2019 Interspeech 2019  
In a typical voice conversion system, vocoder is commonly used for speech-to-features analysis and features-to-speech synthesis. However, vocoder can be a source of speech quality degradation.  ...  This paper presents a novel approach to voice conversion using WaveNet for non-parallel training data.  ...  The proposed approach does not rely on the vocoder features for conversion, which 1) avoids the feature analysis and speech synthesis problems arise from vocoding; 2) reduces the feature mismatch problem  ... 
doi:10.21437/interspeech.2019-1514 dblp:conf/interspeech/TianC019 fatcat:ebmoa63td5gbzlruatksmamnqq

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks [article]

Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez
2018 arXiv   pre-print
In contrast, this paper describes an end-to-end neural approach for estimating a fully-voiced speech waveform from whispered alaryngeal speech.  ...  By adapting our previous work in speech enhancement with generative adversarial networks, we develop a speaker-dependent model to perform whispered-to-voiced speech conversion.  ...  The analysis-by-synthesis approach follows a similar methodology to code-excited linear prediction [7, 8, 9] .  ... 
arXiv:1808.10687v2 fatcat:oaofgzqf3zdurpszupynevvuha

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

Tao Wang, Ruibo Fu, Jiangyan Yi, Jianhua Tao, Zhengqi Wen
2022 IEEE/ACM Transactions on Audio Speech and Language Processing  
To combine the advantages of two vocoders, inspired by the traditional deterministic plus stochastic model, this paper proposes a novel neural vocoder named NeuralDPS which can retain high speech quality  ...  The traditional vocoders have the advantages of high synthesis efficiency, strong interpretability, and speech editability, while the neural vocoders have the advantage of high synthesis quality.  ...  CONCLUSION This paper has proposed a novel neural vocoder named NeuralDPS, which has the characteristics of high speech quality, high synthesis efficiency and noise controllability.  ... 
doi:10.1109/taslp.2022.3140480 fatcat:koyvmq64erg3jofaf6mhlz3kle
« Previous Showing results 1 — 15 out of 192 results