Deep learning for plant identification: how the web can compete with human experts
Biodiversity Information Science and Standards
Automated identification of plants and animals has improved considerably in the last few years, in particular thanks to the recent advances in deep learning. In order to evaluate the performance of automated plant identification technologies in a sustainable and repeatable way, a dedicated system-oriented benchmark was setup in 2011 in the context of ImageCLEF (Goëau et al. 2011). Each year, since that time, several research groups participated in this large collaborative evaluation by
... ing their image-based plant identification systems. In 2014, the LifeCLEF research platform (Joly et al. 2014) was created in the continuity of this effort so as to enlarge the evaluated challenges by considering birds and fishes in addition to plants, and audio and video contents in addition to images. The 2017-th edition of the LifeCLEF plant identification challenge (Joly et al. 2017) is an important milestone towards automated plant identification systems working at the scale of continental floras with 10.000 plant species living mainly in Europe and North America illustrated by a total of 1.1M images. Nowadays, such ambitious systems are enabled thanks to the conjunction of the dazzling recent progress in image classification with deep learning and several outstanding international initiatives, aggregating the visual knowledge on plant species coming from the main national botanical institutes. The PlantCLEF plant challenge that we propose to present at this workshop aimed at evaluating to what extent a large noisy training dataset collected through the web (then containing a lot of labelling errors) can compete with a smaller but trusted training dataset checked by experts. To fairly compare both training strategies, the test dataset was created from a third data source, the Pl@ntNet (Joly et al. 2015) mobile application that collects millions of plant image queries all over the world. Due to the good results obtained at the 2017-th edition of the LifeCLEF plant identification challenge, the next big question is how far such automated systems are from the human expertise. Indeed, even the best experts are sometimes confused and/or disagree with each other when validating images of living organism. A multimedia data actually contains only partial information that is usually not sufficient to determine the right species with certainty. Quantifying this uncertainty and comparing it to the performance of automated systems is of high interest for both computer scientists and expert naturalists. This work reports an experimental study following this idea in the plant domain. In total, 9 deep-learning systems implemented by 3 different research teams were evaluated with regard to 9 expert botanists of the French flora. The main outcome of this work is that the performance of state-of-the-art deep learning models is now close to the most advanced human expertise. This shows that automated plant identification systems are now mature enough for several routine tasks, and can offer very promising tools for autonomous ecological surveillance systems.