Investigating the use of recurrent motion modelling for speech gesture generation

Ylva Ferstl, Rachel McDonnell
2018 Proceedings of the 18th International Conference on Intelligent Virtual Agents - IVA '18  
The growing use of virtual humans demands generating increasingly realistic behavior for them while minimizing cost and time. Gestures are a key ingredient for realistic and engaging virtual agents and consequently automatized gesture generation has been a popular area of research. So far, good gesture generation has relied on explicit formulation of if-then rules and probabilistic modelling of annotated features. Machine learning approaches have yielded only marginal success, indicating a high
more » ... complexity of the speech-to-motion learning task. In this work, we explore the use of transfer learning using previous motion modelling research to improve learning outcomes for gesture generation from speech. We use a recurrent network with an encoder-decoder structure that takes in prosodic speech features and generates a short sequence of gesture motion. We pre-train the network with a motion modelling task. We recorded a large multimodal database of conversational speech for the purpose of this work.
doi:10.1145/3267851.3267898 dblp:conf/iva/FerstlM18 fatcat:qhbirpnbuzac3mytoqhosfz5pe