Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading [chapter]

Sunil Shrestha, Joseph Manzano, Andres Marquez, John Feo, Guang R. Gao
2015 Lecture Notes in Computer Science  
In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO[6], combined with a highly optimized fine grain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques
more » ... hat increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intel Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.
doi:10.1007/978-3-319-17473-0_11 fatcat:z4mrrvucyrecvezcdvdkw4xlrq