Audio-Visual Self-Supervised Terrain Type Recognition for Ground Mobile Platforms

Akiyoshi Kurobe, Yoshikatsu Nakajima, Kris Kitani, Hideo Saito
2021 IEEE Access  
The ability to recognize and identify terrain characteristics is an essential function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots. Recognizing and identifying terrain characteristics is challenging because similar terrains may have very different appearances (e.g., carpet comes in many colors), while terrains with very similar appearance may have very different physical properties (e.g., mulch versus
more » ... ). In order to address the inherent ambiguity in vision-based terrain recognition and identification, we propose a multi-modal selfsupervised learning technique that switches between audio features extracted from a microphone attached to the underside of a mobile platform and image features extracted by a camera on the platform to cluster terrain types. The terrain cluster labels are then used to train an image-based real-time CNN (Convolutional Neural Network) to predict terrain types changes. Through experiments, we demonstrate that the proposed selfsupervised terrain type recognition method achieves over 80% accuracy, which greatly outperforms several baselines and suggests strong potential for assistive applications. INDEX TERMS Ground robots, assistive application, self-supervised learning, CNN. 29970 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see VOLUME 9, 2021
doi:10.1109/access.2021.3059620 fatcat:rkgokxxk4nesnobhsueoyibxsy