Multimodal human-robot interaction in an assistive technology environment

Zhi Li
The research work presented in this thesis is motivated by the increasing demand for care for the elderly. A domestic assistive robot has the potential to supplement humans in the provision of assistance for the elderly with simple daily tasks, such as retrieving small objects from various places, switching lights on and off, and opening and closing doors. The proposed assistive robot possesses both transactional intelligence and spatial intelligence. This thesis concentrates on the realization
more » ... of the transactional intelligence, which enables the robot to naturally and effectively interact with human users. The ultimate goal of this research is to develop a system for the robot to perceive multiple modalities used by humans during face-to-face communication, including speech, eye gaze and gestures, so that the robot is able to understand the user's intention and make appropriate responses. Some important features in the design and implementation of the system are as follows. 1. Naturalness and effectiveness are the fundamental principles in the design of the interaction interface. Therefore, only cameras are used as non-contact sensing devices. 2. The user is observed only from the robot's view, so that the interaction can take place anywhere rather than be confined to a particular room. 3. The behavioural differences between individuals are emphasized, enabling the robot to give appropriate responses to different users. This is achieved by a user identification method and a profile built for each individual user, which stores several characteristics of a specific user. 4. The proposed hand gesture recognition system recognizes both dynamic motion patterns and static hand postures. The 3D Particle Filter-based hand tracking approach combines information of colour, motion and depth. It robustly tracks the hands even when the person wears a short-sleeved shirt exposing the forearm. 5. Different sources of information conveyed by speech, eye gaze and gestures are aligned and then combined by the proposed multimodal [...]
doi:10.4225/03/58a66bb16debc fatcat:2v7hklvpifhkhe7ygknmlwkcvy