Generalization of spatio-temporal deep learning for vision-based force estimation

Finn Behrendt, Nils Thorben Gessert, Alexander Schlaefer, TUHH Universitätsbibliothek
Robot-assisted minimally-invasive surgery is increasingly used in clinical practice. Force feedback offers potential to develop haptic feedback for surgery systems. Forces can be estimated in a vision-based way by capturing deformation observed in 2D-image sequences with deep learning models. Variations in tissue appearance and mechanical properties likely influence force estimation methods' generalization. In this work, we study the generalization capabilities of different spatial and
more » ... patial and spatio-temporal deep learning methods across different tissue samples. We acquire several data-sets using a clinical laparoscope and use both purely spatial and also spatio-temporal deep learning models. The results of this work show that generalization across different tissues is challenging. Nevertheless, we demonstrate that using spatio-temporal data instead of individual frames is valuable for force estimation. In particular, processing spatial and temporal data separately by a combination of a ResNet and GRU architecture shows promising results with a mean absolute error of 15.450 compared to 19.744 mN of a purely spatial CNN.
doi:10.15480/882.3038 fatcat:ypkoc3gm6vdfboqpkg7kdtfcei