OmpSs@FPGA framework for high performance FPGA computing

Juan Miguel Deharo, Jaume Bosch, Antonio Filgueras, Miquel Vidal Pinol, Daniel Jimenez-Gonzalez, Carlos Alvarez, Xavier Martorell, Eduard Ayguade, Jesus Labarta
2021 IEEE transactions on computers  
This paper presents the new features of the OmpSs@FPGA framework. OmpSs is a data-flow programming model that supports task nesting and dependencies to target asynchronous parallelism and heterogeneity. OmpSs@FPGA is the extension of the programming model addressed specifically to FPGAs. OmpSs environment is built on top of Mercurium source to source compiler and Nanos++ runtime system. To address FPGA specifics Mercurium compiler implements several FPGA related features as local variable
more » ... g, wide memory accesses or accelerator replication. In addition, part of the Nanos++ runtime has been ported to hardware. Driven by the compiler this new hardware runtime adds new features to FPGA codes, such as task creation and dependence management, providing both performance increases and ease of programming. To demonstrate these new capabilities, different high performance benchmarks have been evaluated over different FPGA platforms using the OmpSs programming model. The results demonstrate that programs that use the OmpSs programming model achieve very competitive performance with low to moderate porting effort compared to other FPGA implementations. Index Terms-FPGA, reconfigurable hardware, parallel architectures, task-based programming models, High-Level Synthesis ! • All authors are with the Barcelona with other tasks. Dependencies can be declared for any task, avoiding that two tasks that operate over the same memory region execute in parallel, by establishing an implicit execution order through dynamic dependence graphs. In order to generate the executable from the original code, OmpSs uses its own compiler, Mercurium, and runtime system, Nanos++. The compiler processes the pragmas, transforms the code as needed and generates calls to the Nanos++ API [3]. The runtime manages everything needed to execute tasks concurrently, by analyzing task dependencies dynamically and scheduling them to the CPU threads. OmpSs@FPGA extends OmpSs in the sense that it allows to execute a C/C++ task in the FPGA. To do so, it uses Mercurium to transform the code and build a hardware accelerator through High-Level Synthesis (HLS) of the transformed code. This way, several accelerators, which execute a specific type of task, can be used to easily speedup a previous CPU-only application. Of course, in order to get the best performance, the original code has to target the FPGA which needs a different optimization strategy. Moreover, the framework can coexist with the specific pragmas of the underlying HLS tool, to take advantage of the features provided and boost the accelerator even more. Currently, OmpSs@FPGA supports Xilinx software and FPGAs, thus the HLS software being Vivado HLS. The main contributions of this article are the following: • New compiler optimizations for FPGA accelerators that improve memory accesses by the use of a wide memory port. • A new load/store mechanism that saves redundant memory copies in the FPGA accelerators. • A new way to pipeline computations and memory accesses inside FPGA accelerators. • A new complete hardware runtime that in conjunc-
doi:10.1109/tc.2021.3086106 fatcat:hmroqubflbffzire7fnsvabssi