Implementing a non-modular theory of language production in an embodied conversational agent [chapter]

Timo Sowa, Stefan Kopp, Susan Duncan, David McNeill, Ipke Wachsmuth
2008 Embodied Communication in Humans and Machines  
Producing language in spoken discourse is virtually impossible without gestures. Growth Point (GP) theory (McNeill 1992 (McNeill , 2005 McNeill and Duncan 2000) articulates a cognitive model of language production that acknowledges the crucial role of embodiment for speaking in that gestures and speech both are considered integral to language. The model is founded on empirical examination of extended natural discourse, emphasizing finegrained analysis of synchronous, coexpressive speech and
more » ... ures. One, increasingly popular, method to test and to refine cognitive models of language production are computer simulations of multimodal behavior that figure in embodied conversational agents, hereafter ECAs (Cassell et al. 2000 ; see also Poggi and Pelachaud, this volume). Since an ECA always "embodies" a theory, varying the technical model according to different theoretical assumptions has direct impact on its communicative behavior. The effects of manipulating model parameters may then be compared to observations of human behavior and can further inform the modeling effort. On the other hand, confronting an ECA with theoretical psychological concepts like those implied by GP theory can elucidate limits on the computational modeling of human functioning, and can motivate further improvements of ECAs and their communicative behavior. The aim of this chapter is to discuss and assess the feasibility of operationalizing GP theory's model of language production in an ECA. GP theory and computational ECA models have so far been considered to be largely contradictory in a number of central assumptions, the most crucial being the rejection or adoption of a modular structure of the language production system. We first sketch the cornerstones of non-modular GP theory and its empirical basis. Second, we overview the gesture and speech production models that are currently realized in ECAs, and we discuss their potential and limitations with respect to which characteristics of natural speech and gesture they can account for. Such agent architectures are largely inspired by modularist views of speech production, such as Levelt's "Blueprint for the Speaker" (Levelt 1989) . We contrast these theoretical
doi:10.1093/acprof:oso/9780199231751.003.0018 fatcat:ycqedblwgjamtnkx7nvwjisnki