Cloud architecture for plant phenotyping research

Olivier Debauche, Sidi Ahmed Mahmoudi, Nicolas De Cock, Saïd Mahmoudi, Pierre Manneback, Frédéric Lebeau
2020 Concurrency and Computation  
Digital phenotyping is an emergent science mainly based on imagery techniques. The tremendous amount of data generated needs important cloud computing for their processing. The coupling of recent advance of distributed databases and cloud computing offers new possibilities of big data management and data sharing for the scientific research. In this paper, we present a solution combining a lambda architecture built around Apache Druid and a hosting platform leaning on Apache Mesos. Lambda
more » ... cture has already proved its performance and robustness. However, the capacity of ingesting and requesting of the database is essential and can constitute a bottleneck for the architecture, in particular, for in terms of availability and response time of data. We focused our experimentation on the response time of different databases to choose the most adapted for our phenotyping architecture. Apache Druid has shown its ability to respond to typical queries of phenotyping applications in times generally inferior to the second. KEYWORDS cloud architecture, digital phenotyping, lambda architecture, plant phenotyping, research application hosting platform INTRODUCTION With the grow of global population, the need for crop production and raw fiber has also increased considerably. Indeed, the Food and Agricultural Organization of the United Nation (FAO) predicts that the global population will reach 8 billion people by 2025 and 9.6 billion people by 2050. This practically means that an increase of 70% in food production must be achieved by 2050 worldwide. The significant increase in global population and the rising demand for high-quality products create the need for the modernization and intensification of agricultural practices. Simultaneously, efficient use of water and other resources is required. Hence, selection of high efficiency cultivar must be achieved by improving plant breeding. Plant phenotyping is defined as the application of methodologies and protocols to measure a specific trait; from the cellular level to the whole plant or canopy level. 1 Phenotyping is mainly based on imaging techniques (eg, 2D, 3D, or hyperspectral images). Plant productivity gets high from the interaction between its genotype and the environment. Therefore, phenotyping should be correlated with environmental conditions, which highlights the amount of required data. Plant phenotyping was identified as a priority for the European research area. Nowadays, research programs around phenotyping such as transPLANT* (Trans-national Infrastructure for Plant Genomic Science), EPPN2020 † (European Plant Phenotyping Network) emerge and network research activities to increase interaction between phenotyping facilities and users are already in place such as the iPlant, a collaborative cyber-infrastructure, which can be extended using its API to meet researcher needs. The quantitative analysis of plant phenotypes, structure, and function of plants become the major bottleneck. Moreover, the number of images and methods used to store and treat these data are continuously growing. Consequently, the high-throughput of data and the need of specific treatment in real or near real-time require several resources. The increasing amount of particular phenotyping in a given case study needs the development of specific applications. Cloud architectures offer means to store a wide range of huge and heterogeneous data. In addition, it hosts a large quantity of specific models and softwares to process these data. Apache Hadoop and Apache Spark is used to process rapidly and specifically 2D, 3D, and hyperspectral images. The data platform allows to host applications and access to stored data within the lambda architecture. The other advantage of this platform is to exchange, share, and access to different models and data between research teams while ensuring a complete data traceability, privacy, and security. BACKGROUND LAMBDA ARCHITECTURE The literature shows that a lambda architecture 29 is formed by three layers: (1) the batch layer that ingests and stores immutable large datasets and provides batch views; (2) the speed layer, which processes stream data, produces views and deploys them on serving layer; (3) the serving layer receives client queries and produces serving up views merging batch and real-time views. This architecture is able to collect and store the wide range of data from phenotyping and environmental parameters such as temperature, relative humidity, cation-exchange capacity (CEC), NO 3 , etc. We have chosen a Lambda Architecture because it is able to process data from various phenotyping systems either in real-time from automated high-throughput phenotyping systems or time-delayed batch processing of manually acquired images. ‡ https://www.khronos.org/opencl/ § https://opencv.org/
doi:10.1002/cpe.5661 fatcat:d56lx4zndndbdmvlnajnoikdpa