Editorial: Machine Learning for Big Data Analysis: Applications in Plant Breeding and Genomics

Salvatore Esposito, Valentino Ruggieri, Pasquale Tripodi
2022 Frontiers in Genetics  
Next-generation sequencing (NGS) technologies, advanced phenotyping platforms, and machinelearning (ML) as "the science of programming computers so they can learn from data" are leading a new revolution in plant breeding, facilitating a deep understanding of the genotype and its relationship with the phenotype, especially for complex traits (Figure 1 ). In this stimulating scenario, the Research Topic introduces five original research papers and one review focusing on computational analysis and
more » ... machine learning-based approaches for plant breeding and genomics. Among the possible strategies that researchers can adopt to accelerate crop breeding and boost plant production, genomic selection (GS) was proposed in the last few years to design novel breeding programs and to develop new markers-based models for genetic evaluation, thus providing new opportunities to increase the genetic gain of complex traits per unit time and cost. However, as summarized in the review by Danilevicz et al., most of the tools routinely used in genomic selection studies are not designed to capture non-linear relationships within multidimensional datasets or deal with big datasets such as imagery collected by drones. By contrast, given the capacity to extract data features and represent their relationships at multiple levels of abstraction, ML algorithms have the potential to overcome the barriers in prediction accuracy occurring in the tools routinely used for genotype to phenotype predictions. There are three ways to classify machine learning methods, including supervised and supervised models, linear and nonlinear algorithms, and shallow and deep learning models. Artificial neural networks (ANNs), deep neural networks (DNNs), convolutional neural networks (CNNs), random forest (RF), and support vector machines (SVMs) are only a few examples of nonlinear nonparametric machine learning algorithms, that can be applied for processing non-linear data in plant studies. The review by Danilevicz and colleagues summarized the challenges of applying different machine learning methods for increasing the accuracy in predicting phenotypic traits based on molecular markers, environment data, and imagery. The paper by Montesinos-López et al. also underlies the need to overcome the recent limits of genomic selection and proposed a new deep-learning calibration method that can enhance genome-based prediction of continuous crop traits. One of the big challenges in GS studies is the training process, mainly due to the high number of hyper-parameters that must be tuned, thus increasing the probability to add bias in the analysis. For these reasons, the authors proposed a simple method for calibrating (adjusting) continuous predictions resulting from deep learning applications. The proposed deep learning calibration method (DL_M2) was tested in four different crop breeding datasets and its performance was compared with the standard deep learning method (DL_M1) and with the standard genomic best linear unbiased predictor (GBLUP). The authors claimed that, although the GBLUP was the most accurate model, the proposed deep
doi:10.3389/fgene.2022.916462 pmid:35711914 pmcid:PMC9197449 fatcat:foxmaidycrdkrdczz3tvf7sbzu