Mind model for multimodal communicative creatures and humanoids

Kristinn R. Thorisson
1999 Applied Artificial Intelligence  
This paper presents a computational model of real-time task-oriented dialogue skills. The architecture, termed Ymir , bridges between multimodal perception and multimodal action and supports the creation of autonomous computer characters that afford full-duplex, real-time face-to-face interaction with a human. Ymir has been prototyped in software, and a humanoid created, called Gandalf , capable of fluid multimodal dialogue. Ymir demonstrates several new ideas in the creation of communicative
more » ... mputer agents, including perceptual integration of multimodal events , distributed planning and decision making , an explicit handling of real-time , and layered input analysis and motor control with human characteristics . This paper describes the architecture and explains its main elements. Examples of implementation and performance are given, and the architectureÕs limitations and possibilities are discussed. Artificial agents in the current context can be thought of as {1} having a body, {2} having task-related knowledge, and {3} having goals, usually conveyed to them by their users, and {4} having communicative skills appropriate for the tasks they perform. A vacuum-cleaning robot is a good example of a situated agent with a body, knowledge about a task, and a specific goal to serve its user. How users convey their wishes and intent to the agent is an issue of human-computer interface design. For example, Chin (1991) describes an agent that gives users advice about UNIX commands during interactive sessions. This system is a text-based natural language system using a keyboard as the input device, and written English as the language of communication. Such agents rely on the traditional interaction hardware of keyboard, mouse and monitor to interact with their users. In contrast, the model here is face-to-face interaction between humans: we want to communicate naturally with the virtual character, keeping the communication channel as broad as possible. The term Òface-to-faceÓ not only to refers to the presence of faces but in general to embodied, co-present, co-temporal, non-mediated communication. This means that the interaction is multimodal. There are several tasks where human-like interaction skills can make the task of ÒprogrammingÓ agents more straight-forward: If our vacuum-cleaning robot has multimodal perception and multimodal interpretive skills we can simply point into a corner and tell it to ÒVacuum that corner tomorrow.Ó In this case the communicative vacuum cleaner would have very human communication skills. Laurel (1992) lists some other chores that agents with such communicative skills 3 might do well: coaching, tutoring, providing help, following orders, reminding, advising and entertaining, e.g. playing against, playing with and performing. We already know a lot about how to represent the topic knowledge needed for many tasks. What has been missing is a general architecture that can integrate the critical pieces of multimodal real-time dialogue in a computer character with one or more of the above skills. This paper describes Ymir, 1 a model well suited for creating autonomous creatures capable of humanlike communication with real users. A prototype agent called Gandalf, created in the Ymir architecture, will also be described (FIGURE 1). Ymir does essentially what Fehling et al. (1988) call resourcebounded problem solving. The problem is task-oriented dialogue ; the resources are time , information and computational power . Dialogue has the additional quality of encompassing many features inherent in other tasksÑdialogue planning, gaze control, turn-taking, multimodal actions all have an equivalent in complex tasks where actions on several levels of detail, such as movement of arms, sensors and body have to be coordinated to meet high-level goals. On the practical side, Ymir is intended to be used for creating synthetic characters, softbots, even robots, whose purpose in life is to receive commands from humans through face-to-face interaction, ask questions when appropriate, but otherwise do the job as best their knowledge allows them to. On the theoretical side, Ymir could be used to test theories about human discourse, because it provides the possibility to turn certain dialogue actions on and off at willÑsomething that was impossible to do before, even with a skilled actor. This would for example be very useful in testing theories of multimodal miscommunication (cf. Traum & Dillenbourg, 1996), the grounding process (cf. Clark, 1992) and collaboration principles in dialogue (Grice, 1989) . The current version of Gandalf has already shown the value of the system for this purpose (Cassell & Th-risson, 1998) . Isolated parts of Ymir have been presented elsewhere (Th-risson 1998 (Th-risson , 1997 and the theoretical underpinnings and assumptions of the model can be found in Th-risson (1996 Th-risson ( , 1995 . Here I give a comprehensive overview of the model, give examples of an implementation and discuss how the model can be extended to a broader range of behaviors. But first we will look at related work and take a brief look at face-to-face interaction the way it has most often been conducted in the last few millennia: between people. FIGURE 1. The prototype humanoid GandalfÕs face and hand, comprising a total of 23 df.
doi:10.1080/088395199117342 fatcat:w2vaplaiffbwhokwt3pvt5zeme