A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Vocoder-Based Speech Synthesis from Silent Videos
2020
Interspeech 2020
Both acoustic and visual information influence human perception of speech. For this reason, the lack of audio in a video sequence determines an extremely low speech intelligibility for untrained lip readers. In this paper, we present a way to synthesise speech from the silent video of a talker using deep learning. The system learns a mapping function from raw video frames to acoustic features and reconstructs the speech with a vocoder synthesis algorithm. To improve speech reconstruction
doi:10.21437/interspeech.2020-1026
dblp:conf/interspeech/MichelsantiSHGT20
fatcat:25stjvbk3vbctkokv3e6pi32di