A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

Enrico Calore, Sebastiano Fabio Schifano, Raffaele Tripiccione
2014 Procedia Computer Science  
The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus
more » ... omputing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes.
doi:10.1016/j.procs.2014.05.004 fatcat:exrfaxm7nnhjdlectgdoop5cci