Toward a theory of organized multimodal integration patterns during human-computer interaction

Sharon Oviatt, Rachel Coulston, Stefanie Tomko, Benfang Xiao, Rebecca Lunsford, Matt Wesson, Lesley Carmichael
2003 Proceedings of the 5th international conference on Multimodal interfaces - ICMI '03  
As a new generation of multimodal systems begins to emerge, one dominant theme will be the integration and synchronization requirements for combining modalities into robust whole systems. In the present research, quantitative modeling is presented on the organization of users' speech and pen multimodal integration patterns. In particular, the potential malleability of users' multimodal integration patterns is explored, as well as variation in these patterns during system error handling and
more » ... varying in difficulty. Using a new dual-wizard simulation method, data was collected from twelve adults as they interacted with a map-based task using multimodal speech and pen input. Analyses based on over 1600 multimodal constructions revealed that users' dominant multimodal integration pattern was resistant to change, even when strong selective reinforcement was delivered to encourage switching from a sequential to simultaneous integration pattern, or vice versa. Instead, both sequential and simultaneous integrators showed evidence of entrenching further in their dominant integration patterns (i.e., increasing either their intermodal lag or signal overlap) over the course of an interactive session, during system error handling, and when completing increasingly difficult tasks. In fact, during error handling these changes in the co-timing of multimodal signals became the main feature of hyper-clear multimodal language, with elongation of individual signals either attenuated or absent. Whereas Behavioral/Structuralist theory cannot account for these data, it is argued that Gestalt theory provides a valuable framework and insights into multimodal interaction. Implications of these findings are discussed for the development of a coherent theory of multimodal integration during human-computer interaction, and for the design of a new class of adaptive multimodal interfaces.
doi:10.1145/958432.958443 dblp:conf/icmi/OviattCTXLWC03 fatcat:qvkovaogvnha3lnve3pbbj2ko4