A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Lightly supervised GMM VAD to use audiobook for speech synthesiser
2013
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Audiobooks have been focused on as promising data for training Text-to-Speech (TTS) systems. However, they usually do not have a correspondence between audio and text data. Moreover, they are usually divided only into chapter units. In practice, we have to make a correspondence of audio and text data before we use them for building TTS synthesisers. However aligning audio and text data is time-consuming and involves manual labor. It also requires persons skilled in speech processing.
doi:10.1109/icassp.2013.6639220
dblp:conf/icassp/MamiyaYWCKS13
fatcat:ostjrzz36fe7xgc227h6ddhpbi