A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
A convolutional neural network and deep autoencoder are used to predict Line Spectral Frequencies, F0, and a voiced/unvoiced flag in singing data, using as input only ultrasound images of the tongue and visual images of the lips. A novel convolutional vocoder to transform the learned parameters into an audio signal is also presented. Spectral Distortion of predicted Line Spectral Frequencies is reduced compared to that in an earlier study using handcrafted features and multilayer perceptrons ondoi:10.1109/access.2021.3050843 fatcat:i4xx6m5d2nhk3pwzeoi6p5omcq