Compiler and runtime support for the execution of scientific codes with unstructured datasets on heterogeneous parallel architectures [thesis]

Pablo Barrio López-Cortijo
v Resumen vii Bibliography 133 Glossary 149 iii Contents iv Abstract Simulation codes based in the discretization of time and space for solving systems of partial differential equations are used nowadays in relevant industrial sectors. These include, for example, finite volumes and finite element methods typically used in Computational Fluid Dynamics (CFD). These simulation applications are widely used various industries such as aeronautics, automotive or weather prediction. More recently, they
more » ... have been used in novel applications such as the simulation of blood streams for medical diagnosis, or the design of wind turbines for energy generation. The complexity of these simulations and the sheer size of the datasets used often require a considerable amount of computing power in order to obtain results in a reasonable time. In the last years, the optimization of these applications has been a priority for many companies, and even real time performance is now often considered as a stretch goal for optimization attempts. High Performance Computing (HPC) systems have been used for some time now to run these simulations, achieving reasonable speedups by partitioning the datasets and running the simulation in several processors. However, the need to run them in as little time as possible, the increase in the dataset sizes to allow for higher accuracy, and limitations in the multi-process scalability of CFD simulations, have resulted in joint worldwide efforts to achieve the ever-increasing performance and problem size objectives by means of newer, disruptive technologies. This thesis approaches the problem of optimizing the execution of these codes with the help of Heterogeneous HPC systems. These systems differ from standard, homogeneous HPC systems in that the architectures of the processing elements used as building blocks differ from each other. A special interest of this work is to analyze the feasibility of using Field-Programmable Gate Arrays (FP-GAs) in these systems as accelerators to scientific simulations. Because these devices are essentially reconfigurable hardware, they allow a finer-grain parallelism than general-purpose processors, which translates into a higher throughput when computational kernels are ported to them. Additionally, FPGAs achieve levels of power efficiency that are currently unparalleled by any existing mainstream computing device. Unfortunately, the novelty of this approach implies that the v
doi:10.20868/upm.thesis.48358 fatcat:mazktnzwrrb5rhugvl7trcj4my