A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Speech Prosody 2016
We propose a method ("FONN") for F0 contour generation for text-to-speech synthesis. Training speech is automatically segmented into left-headed feet, annotated with syllable start/end times, foot position in the sentence, and the number of syllables in the foot. During training, we fit a superpositional intonation model comprising accent curves associated with feet and phrase curves. We propose to use a neural network for model parameter estimation. We tested the method against the HMM-baseddoi:10.21437/speechprosody.2016-207 fatcat:756fz2k7u5ayrkyz4hdbs3u5fi