Prediction of a plant intracellular metabolite content class using image-based deep learning
Plant-derived secondary metabolites play a vital role in the food, pharmaceutical, agrochemical and cosmetic industry. Metabolite concentrations are measured after extraction, biochemistry and analyses, requiring time, access to expensive equipment, reagents and specialized skills. Additionally, metabolite concentration often varies widely among plants, even within a small area. A quick method to estimate the metabolite concentration class (high or low) will significantly help in selecting
... yielding high metabolites for the metabolite production process. Here, we demonstrate a deep learning approach to estimate the concentration class of an intracellular metabolite, azadirachtin, using models built with images of leaves and fruits collected from randomly selected Azadirachta indica (neem) trees in an area spanning >500,000 sqkms and their corresponding biochemically measured metabolite concentrations. We divided the input data randomly into training- and test-sets ten times to avoid sampling bias and to optimize the model parameters during cross-validation. The training-set contained >83,000 fruit and >86,000 leaf images. The best models yielded prediction errors of 19.13% and 15.11% (for fruit), and 8% and 26.67% (for leaf), each, for low and high metabolite classes, respectively. We further validated the fruit model using independently collected fruit images from different locations spanning nearly 130,000 sqkms, with 70% accuracy. We developed a desktop application to scan offline image(s) and a mobile application for real-time utility to predict the metabolite content class. Our work demonstrates the use of a deep learning method to estimate the concentration class of an intracellular metabolite using images, and has broad applications and utility.