Multimodal interfaces: Challenges and perspectives

Nicu Sebe
2009 Journal of Ambient Intelligence and Smart Environments  
The development of interfaces has been a technology-driven process. However, the newly developed multimodal interfaces are using recognition-based technologies that must interpret human-speech, gesture, gaze, movement patterns, and other behavioral cues. As a result, the interface design requires a human-centered approach. In this paper we review the major approaches to multimodal Human Computer Interaction, giving an overview the user and task modeling, and to the multimodal fusion. We
more » ... t the challenges, open issues, and the future trends in multimodal interfaces research. eye blinks [23] , and many others). Glove mounted devices [8] and graspable user interfaces [20], for example, seem now ripe for exploration. Pointing devices with haptic feedback, eye tracking, and gaze detection [27] are also currently emerging. As in human-human communication, however, effective communication is likely to take place when different input devices are used in combination. Multimodal interfaces have been shown to have many advantages [12]: they prevent errors, bring robustness to the interface, help the user to correct errors or recover from them more easily, bring more bandwidth to the communication, and add alternative communication methods to different situations and environments. Disambiguation of error-prone modalities using multimodal interfaces is one important motivation for the use of multiple modalities in many systems. As shown by Oviatt [48], error-prone technologies can compensate each other, rather than bring redundancy to the interface and reduce the need for error correction. It should be noted, however, that multiple modalities alone do not bring benefits to the interface: the use of multiple modalities may be ineffective or even disadvantageous. In this context, Oviatt [49] has presented the common misconceptions (myths) of multimodal interfaces, most of them related to the use of speech as an input modality.
doi:10.3233/ais-2009-0003 fatcat:y2zimhpxrngznf64a2yulttkhm