Run-time reference clustering for cache performance optimization

W.K. Kaplow, B.K. Szymanski, P. Tannenbaum, V.K. Decyk
Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis  
We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re-mapping techniques that are a part of distributed memory multi-processor codes designed to minimize
more » ... igned to minimize interprocessor communication. Re-mapping each set of local data decreases cache-misses the same way remapping the global data decreases off-processor references. We demonstrate the applicability and performance of the RTRC technique on several prevalent applications: Sparse Matrix-Vector Multiply, Particle-In-Cell, and CHARMMlike codes. Performance results on show that single node execution performance can be improved by as much as 35%. ing the run-time dependent remote access requirements of the application, and providing efficient facilities to perform the communication. These problems are addressed in the CHAOS/PARTI run-time and compilation methods [4, 12, 10] . The essential technique is the inspector/executor model in which the inspector is used to determine which references are required for execution, and the executor performs the communication and the actual computation.
doi:10.1109/aispas.1997.581623 fatcat:7lr73427vranvmdm5osnzm6eky