An Approach for Semiautomatic Locality Optimizations Using OpenMP [chapter]

Jens Breitbart
2012 Lecture Notes in Computer Science  
The processing power of multicore CPUs increases at a high rate, whereas memory bandwidth is falling behind. Almost all modern processors use multiple cache levels to overcome the penalty of slow main memory; however cache efficiency is directly bound to data locality. This paper studies a possible way to incorporate data locality exposure into the syntax of the parallel programming system OpenMP. We study data locality optimizations on two applications: matrix multiplication and Gauß-Seidel
more » ... ncil. We show that only small changes to OpenMP are required to expose data locality so a compiler can transform the code. Our notion of tiled loops allows developers to easily describe data locality even at scenarios with non-trivial data dependencies. Furthermore, we describe two new optimization techniques. One explicitly uses a form of local memory to prevent conflict cache misses, whereas the second one modifies the wavefront parallel programming pattern with dynamically sized blocks to increase the number of parallel tasks. As an additional contribution we explore the benefit of using multiple levels of tiling.
doi:10.1007/978-3-642-28145-7_29 fatcat:pnlzpvfi65fajih63nchg7wqoi