Learning Backchannel Prediction Model from Parasocial Consensus Sampling: A Subjective Evaluation [chapter]

Lixing Huang, Louis-Philippe Morency, Jonathan Gratch
2010 Lecture Notes in Computer Science  
Backchannel feedback is an important kind of nonverbal feedback within face-to-face interaction that signals a person's interest, attention and willingness to keep listening. Learning to predict when to give such feedback is one of the keys to creating natural and realistic virtual humans. Prediction models are traditionally learned from large corpora of annotated face-to-face interactions, but this approach has several limitations. Previously, we proposed a novel data collection method,
more » ... ial Consensus Sampling, which addresses these limitations. In this paper, we show that data collected in this manner can produce effective learned models. A subjective evaluation shows that the virtual human driven by the resulting probabilistic model significantly outperforms a previously published rule-based agent in terms of rapport, perceived accuracy and naturalness, and it is even better than the virtual human driven by real listeners' behavior in some cases. Indeed, these and related studies suggest that a virtual human's behavior may be more important than its appearance in achieving social effects [8] . Although early research on virtual humans relied on hand-crafted algorithms to generate nonverbal behaviors, informed by psychological theories or personal observations of face-to-face interaction [4], recent scholarship has seen an explosion in interest in data-driven approaches that automatically learn virtual human behaviors from annotated corpora of human face-to-face interactions. Several systems now exist that automatically learn a range of nonverbal behaviors including backchannel feedback [2], conversational gestures [9,15] and turn-taking cues [10] . It is widely assumed that natural human-to-human interaction constitutes the ideal dataset from which to learn virtual human behaviors, however, there are drawbacks with such data. First, natural data can be expensive and time-consuming to collect. Second, human behaviors contain variability so that some of the behavior samples may conflict with the social effect that we want the virtual human to produce. Finally, each instance in face-to-face interaction only illustrates how one particular individual responds to another, yet such data fails to give us insight on how well such responses generalize across individuals. Rather than simply exploring more powerful learning algorithms that might overcome these drawbacks, we argue that attention should also be directed at innovative methods for collecting behavioral data. Recently, we proposed a novel data collection approach called Parasocial Consensus Sampling (PCS) [1] to inform virtual human nonverbal behavior generation. Instead of interacting face-to-face, participants were guided through a "parasocial" interaction in which they attempted to produce natural nonverbal behaviors to pre-recorded videos of human interaction partners. Through this method we were able to quickly collect large amounts of behavioral data, but more importantly, we were able to assess how multiple individuals might respond to the identical social situation. These multiple perspectives afford the possibility of driving virtual humans with the consensus view on how one should respond, rather than simply concatenating many idiosyncratic responses. A test of this approach, applied to the problem of generating listener nonverbal feedback, showed that 1) participants felt comfortable producing behavior in this manner and 2) the resulting consensus perceived more accurate and more effective than natural feedback (i.e., feedback from the natural listener in face-to-face conversation). Although this was a promising first step, it remains to demonstrate that consensus data can be used to train an effective predictive model. In this article, we take this next logical step in demonstrating the power of the PCS: using consensus data, we train a predictive model of listener backchannel feedback. We compare the performance of this model against our previous Rapport Agent that generated behaviors according to a hand-crafted mapping. Our subjective evaluation shows the virtual human driven by this probabilistic model performs significantly better than the Rapport Agent [6] in terms of rapport, perceived accuracy and naturalness, and it is even better than the virtual human driven by real listener's behavior in some cases.
doi:10.1007/978-3-642-15892-6_17 fatcat:rc2ik4jz7bgzzo7kalnnixq32a