Machine Learning for High-Throughput Stress Phenotyping in Plants
Trends in Plant Science
Advances in automated and high-throughput imaging technologies have resulted in a deluge of high-resolution images and sensor data of plants. However, extracting patterns and features from this large corpus of data requires the use of machine learning (ML) tools to enable data assimilation and feature identification for stress phenotyping. Four stages of the decision cycle in plant stress phenotyping and plant breeding activities where different ML approaches can be deployed are (i)
... on, (ii) classification, (iii) quantification, and (iv) prediction (ICQP). We provide here a comprehensive overview and user-friendly taxonomy of ML tools to enable the plant community to correctly and easily apply the appropriate ML tools and best-practice guidelines for various biotic and abiotic stress traits. Plant Stress Phenotyping in Agriculture To meet the future demand of food, feed, fiber, and fuel, crop production must be doubled by 2050 i Crop yields are limited inherently by plant stresses (biotic and abiotic), and plant breeders have protected yield from plant stress losses by incorporating resistance genes and developing more climatically-resilient cultivars. Plant breeders and researchers rely on plant phenotyping for accurate and precise trait collection and use of genetic tools to achieve their research goals. Plant phenotyping is defined as the application of methodologies and protocols to measure a specific trait, ranging from the cellular level to the whole plant or canopy level, related to plant structure and function  . Agriculture research programs phenotype large populations for several traits throughout the crop growth cycle. This challenge to phenotype multiple traits and large populations is exacerbated by the necessity of sampling multiple environments and growing replicated trials. Until recently, traditional methods of phenotyping have not kept pace with the available high-throughput genotyping tools. The bottleneck in phenotyping has driven intense efforts by the scientific community of agriculture researchers and engineers to adapt newer technologies in field phenotyping. A classic example is high-throughput phenotyping (HTP), which has unlocked new prospects for non-destructive field-based phenotyping in plants for a large number of traits including physiological, biotic (includes living factors such as fungi, bacteria, virus, insects, parasites, and weeds, etc.) and abiotic (includes non-living factors such as drought, flood, nutrient deficiency, and other environmental factors) stress traits [2,3]. Both ground and aerial HTP platforms, equipped with multiple sensors are being used in agriculture to measure multiple plant traits at varying growth stages rapidly, precisely, and accurately ( Figure 1A, Key Figure) . Examples of these HTP platforms include deployment in cotton (Gossypium hirsutum L.)  , triticale (Â Triticosecale Wittmack L.)  , and maize (Zea mays L.)  . Recent advances in sensors for imaging plants [7, 8] , ranging from remote sensing including spectroradiometry , Light Detection and Ranging (LIDAR) , visible to far-infrared Trends High-throughput phenotyping (HTP) has unlocked new prospects for nondestructive field-based phenotyping. Autonomous, semi-autonomous, or manual platforms equipped with single or multiple sensors collect spatial and temporal data, resulting in massive amounts of data for analysis and storage. The enormous volume, variety, and velocity of HTP data generated by such platforms make it a 'big data' problem. Big data generated by these near realtime platforms must be efficiently archived and retrieved for analysis. The analysis and interpretation of these large datasets is quite challenging. Sophisticated data collection, storage, and processing are becoming ubiquitous, and newer areas of application are emerging constantly. One such relatively new domain is plant stress analytics. ML algorithms are a very promising approach for faster, more efficient, and better data analytics. ML being inherently multidisciplinary draws inspiration and utilizes concepts from probability theory, statistics, decision theory, optimization, and visualization. Most current applications of ML tools in plant sciences have focused on using a limited set of ML tools (SVM, ANN). A good understanding of which, when, and how various ML tools can be applied will be very informative to the plant community to leverage these ML tools.