OpenMP compiler for distributed memory architectures

Jue Wang, ChangJun Hu, JiLin Zhang, JianJiang Li
2010 Science China Information Sciences  
OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the "partially replicating shared arrays" memory
more » ... l, we propose an algorithm for shared array recognition based on the inter-procedural analysis, optimization technique based on the producer/consumer relationship, and communication generation technique for nonlinear references. We evaluate the performance on nine benchmarks which cover computational fluid dynamics, integer sorting, molecular dynamics, earthquake simulation, and computational chemistry. The average scalability achieved by KLCoMP version is close to that achieved by MPI version. We compare the performance of our translated programs with that of versions generated for Omni+SCASH, LLCoMP, and OpenMP(Purdue), and find that parallel applications (especially, irregular applications) translated by KLCoMP can achieve more effective performance than other versions. 933 did not break away from previous SDSM OpenMP, especially for irregular applications. Skeleton method [10] is used in LLCoMP to translate extended OpenMP to MPI. Because the skeleton is difficult for compiling optimization, it has no effect on discontinuous data accesses. To some extent, extensions to OpenMP increase the burden of programming. The third category translates OpenMP to MPI+SDSM or other parallel languages. Some of research works [11, 12] translate OpenMP to MPI+SDSM to reduce the overhead of SDSM. However, SDSM is still the bottleneck of system performance. Chapman et al. [13] translated OpenMP to Global Array based on OpenUH compiler. KLCoMP consists of a source-to-source compiler and a runtime library. OpenMP directives are translated into corresponding APIs provided by our run-time library, which are written by MPI+Pthread. Compared with previous OpenMP compilers, KLCoMP employs MPI+ "partially replicating shared arrays" memory model to reduce data volume maintained by compiler, and it then reduces the compiletime/run-time overhead of program. Based on the memory model, how to effectively recognize shared variables, to reduce or hide communication and to increase the efficiency of irregular applications are the key to the improvement of KLCoMP. This paper proposes an effective algorithm for shared array recognition based on the inter-procedural analysis, optimizations based on producer/consumer relationship and communication generation technique for nonlinear references. We adopt nine applications, which cover computational fluid dynamics, integer sorting, mandelbrot set computation, molecular dynamics, earthquake simulation and computational chemistry, to evaluate the performance of KLCoMP. These applications are from widely-used benchmarks including NAS benchmark [14] , SPEC OMPM2001 [15], COSMIC software [16] and CHARMM [17] . We note that five benchmarks fall into typical irregular applications. Taking these benchmarks as input, KLCoMP generates parallel codes which gain the scalability close to MPI version. Especially for irregular applications, the performance of generated codes is better than that of codes generated by Omni+SCASH [2], LLCoMP[10] and OpenMP(Purdue) [7] . The rest of this paper is organized as follows. Section 2 provides a memory model for KLCoMP; section 3 gives communication optimizaitons based on this memory model; section 4 gives communication generation technique for nonlinear references; section 5 evaluates proposed techniques in KLCoMP; section 6 draws our conclusions.
doi:10.1007/s11432-010-0074-0 fatcat:xvnhb6emcrcmfdudqcppexsua4