TTS-driven Embodied Conversation Avatar for UMB-SmartTV

Matej Rojc, Zdravko Kačič, Marko Presker, Izidor Mlakar
<span title="2021-04-14">2021</span> <i title="North Atlantic University Union (NAUN)"> <a target="_blank" rel="noopener" href="" style="color: black;">International journal of computers and communications</a> </i> &nbsp;
When human-TV interaction is performed by remote controller and mobile devices only, the interactions tend to be mechanical, dreary and uninformative. To achieve more advanced interaction, and more human-human like, we introduce the virtual agent technology as a feedback interface. Verbal and co-verbal gestures are linked through complex mental processes, and although they represent different sides of the same mental process, the formulations of both are quite different. Namely, verbal
more &raquo; ... on is bound by rules and grammar, whereas gestures are influenced by emotions, personality etc. In this paper a TTS-driven behavior generation system is proposed for more advanced interface used for smart IPTV platforms. The system is implemented as a distributive non-IPTV service and integrated into UMB-SmartTV in a service-oriented fashion. The behavior generation system fuses speech and gesture production models by using FSMs and HRG structures. Features for selecting the shape and alignment of co-verbal movement are based on linguistic features (that can be extracted from arbitrary input text), and prosodic features (as predicted within several processing steps in the TTS engine). At the end, the generated speech and co-verbal behavior are animated by an embodied conversational agent (ECA) engine and represented to the user within the UMB-SmarTV user interface.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.46300/91013.2021.15.1</a> <a target="_blank" rel="external noopener" href="">fatcat:6wo2rrbp3ffqdib7gsfcdvjyri</a> </span>
