A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks
2018
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitchsynchronous excitation model matched to
doi:10.1109/icassp.2018.8461852
dblp:conf/icassp/JuvelaBWKAYA18
fatcat:jwcgnd73lrh6ljq4ozmvx3pbje