A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Animating Face using Disentangled Audio Representations
2020
2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
Previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or changing its emotional tone (to for example, sad). To make talking head generation robust to such variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional
doi:10.1109/wacv45572.2020.9093527
dblp:conf/wacv/MittalW20
fatcat:cdiijakmnnbctf4c4msnanrddm