Learning the retinal anatomy from scarce annotated data using self-supervised multimodal reconstruction
Applied Soft Computing
Deep learning is becoming the reference paradigm for approaching many computer vision problems. Nevertheless, the training of deep neural networks typically requires a significantly large amount of annotated data, which is not always available. A proven approach to alleviate the scarcity of annotated data is transfer learning. However, in practice, the use of this technique typically relies on the availability of additional annotations, either from the same or natural domain. We propose a novel
... alternative that allows to apply transfer learning from unlabelled data of the same domain, which consists in the use of a multimodal reconstruction task. A neural network trained to generate one image modality from another must learn relevant patterns from the images to successfully solve the task. These learned patterns can then be used to solve additional tasks in the same domain, reducing the necessity of a large amount of annotated data. In this work, we apply the described idea to the localization and segmentation of the most important anatomical structures of the eye fundus in retinography. The objective is to reduce the amount of annotated data that is required to solve the different tasks using deep neural networks. For that purpose, a neural network is pre-trained using the self-supervised multimodal reconstruction of fluorescein angiography from retinography. Then, the network is fine-tuned on the different target tasks performed on the retinography. The obtained results demonstrate that the proposed selfsupervised transfer learning strategy leads to state-of-the-art performance in all the studied tasks with a significant reduction of the required annotations. address: email@example.com (Á.S. Hervella). fields, the number of methods based on neural networks has grown significantly in the last few years, which carried an improvement of the obtained results    . Currently, the use of deep neural networks (DNNs) is the standard approach in many computer vision applications when the required annotated data is available. DNNs have not only improved the results obtained with traditional methods, but have also brought a new simplified paradigm where no feature design is needed  . Instead, the focus has shifted to the design or selection of the most suitable network architectures, training losses and training strategies  . Regarding the automatic analysis of representative anatomical structures in retinography, the main limitation for the early use of DNNs was the scarcity of annotated data  . In that sense, the available datasets typically present a small number of annotated samples due to the difficulty of hand-labelling the retinal images in detail. Moreover, despite that some large datasets have been gathered, in practice, the annotated data usually present a meagre representation of pathological cases , given that those images are typically of higher variability and complexity.