An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation

Yukinori Sato, Tomoya Yuki, Toshio Endo
2019 ACM Transactions on Architecture and Code Optimization (TACO)  
On modern many-core CPUs, performance tuning against complex memory subsystems and scalability for parallelism is mandatory to achieve their potential. In this article, we focus on loop tiling, which plays an important role in performance tuning, and develop a novel framework that analytically models the load balance and empirically autotunes unpredictable cache behaviors through iterative polyhedral compilation using LLVM/Polly. From an evaluation on many-core CPUs, we demonstrate that our
more » ... tuner achieves a performance superior to those that use conventional static approaches and well-known autotuning heuristics. Moreover, our autotuner achieves almost the same performance as a brute-force search-based approach. CCS Concepts: • General and reference → Performance; • Hardware → Emerging tools and methodologies; • Software and its engineering → Search-based software engineering;
doi:10.1145/3293449 fatcat:be4kecwhw5fnbi5krgabv7jqba